Skip to contents

Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

nonlinearity

The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional RNN. Default: FALSE

...

other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$

where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then \(\mbox{ReLU}\) is used instead of \(\tanh\).

Inputs

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

  • Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and L represents a sequence length.

  • Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)

  • Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch

Attributes

  • weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.7760  0.7453 -0.8135 -0.1260 -0.6849 -0.5407 -0.1798 -0.0782  0.3087
#>  -0.3387 -0.4634  0.5041  0.6188  0.7599 -0.4540 -0.9372  0.2091 -0.5749
#>   0.1445  0.8924 -0.4224  0.1523  0.2014  0.1200 -0.3852  0.0442  0.6321
#> 
#> Columns 10 to 18 -0.4789  0.8594 -0.0204 -0.6847 -0.5105  0.6508 -0.3157 -0.7205 -0.6654
#>   0.9237  0.5144 -0.8931 -0.5068  0.3923 -0.2059 -0.3604  0.8662 -0.0582
#>   0.9189 -0.0485 -0.1994 -0.2125 -0.4593  0.4590  0.8533  0.8538  0.2311
#> 
#> Columns 19 to 20  0.8080 -0.2104
#>   0.3382 -0.4767
#>   0.2500 -0.3892
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.2815  0.5201 -0.6495  0.0351 -0.1513 -0.2072 -0.2565 -0.3849 -0.1523
#>   0.0071  0.7487 -0.0252 -0.1314 -0.5933 -0.0914 -0.2361 -0.2672  0.3404
#>  -0.0799  0.6846 -0.0745  0.5310  0.2673 -0.4345  0.0682  0.2624  0.2324
#> 
#> Columns 10 to 18 -0.7287  0.2205 -0.6220  0.4348 -0.4572 -0.1141 -0.0819 -0.0709  0.6546
#>   0.2467  0.7273  0.2330  0.0524 -0.0720 -0.1960 -0.1301 -0.4423 -0.6132
#>   0.4452  0.6180 -0.4571  0.0999  0.0538  0.2452  0.1344 -0.4337  0.1214
#> 
#> Columns 19 to 20  0.0708 -0.5350
#>  -0.0706  0.1784
#>   0.6142 -0.4183
#> 
#> (3,.,.) = 
#>  Columns 1 to 9 -0.4763  0.5340  0.1026  0.4987 -0.1831 -0.2551 -0.0806 -0.4321 -0.3496
#>  -0.2620  0.2877 -0.2450 -0.0023  0.3150 -0.1754 -0.2119  0.1692 -0.0932
#>  -0.1292  0.6732 -0.2190 -0.0891  0.0655  0.1022 -0.3270  0.0892  0.0378
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.4751 -0.8483  0.0879  0.6429 -0.5682  0.6717 -0.4121 -0.2049  0.6930
#>  -0.7307 -0.6716 -0.1829  0.5269 -0.7630  0.3690 -0.6108  0.0137  0.8703
#>  -0.8678  0.0393  0.3725 -0.5869  0.2251 -0.1100  0.0490  0.7668  0.1678
#> 
#> Columns 10 to 18  0.6443 -0.7770 -0.4368  0.3099 -0.6685 -0.5122  0.2409 -0.7944  0.0299
#>  -0.0300 -0.6571  0.1197 -0.4809 -0.4327 -0.1425  0.7034 -0.6868 -0.6755
#>  -0.6285  0.0818  0.4730 -0.1781  0.2157  0.4803  0.0702  0.2968 -0.2557
#> 
#> Columns 19 to 20  0.0032 -0.1562
#>  -0.1597  0.1578
#>   0.1684 -0.4400
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.2249  0.6148  0.0257  0.2643  0.0557  0.4933 -0.0232 -0.2164 -0.1008
#>   0.2075  0.4470 -0.1378  0.3002  0.5191  0.1169 -0.0708 -0.0279  0.0517
#>  -0.3448 -0.0203 -0.3674  0.1453  0.3233 -0.5030 -0.1293  0.1307 -0.1935
#> 
#> Columns 10 to 18  0.5867  0.6088 -0.8176  0.1219  0.3052  0.4675 -0.0821 -0.6279 -0.4466
#>   0.6094  0.5894 -0.3966 -0.2328  0.5532  0.4560  0.4582 -0.5609 -0.2418
#>   0.3690  0.0953 -0.0665  0.2931  0.0232 -0.4560  0.1377  0.2364  0.6089
#> 
#> Columns 19 to 20  0.5521  0.0811
#>  -0.0332 -0.5511
#>  -0.0849 -0.7609
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>