Skip to contents

Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

nonlinearity

The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional RNN. Default: FALSE

...

other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$

where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then \(\mbox{ReLU}\) is used instead of \(\tanh\).

Inputs

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

  • Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and L represents a sequence length.

  • Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)

  • Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch

Attributes

  • weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.6210 -0.0224  0.0897 -0.1271  0.3904  0.4547  0.0283 -0.2772 -0.6163
#>  -0.1755  0.4043  0.3834  0.8691  0.3629  0.2580 -0.7042  0.7463 -0.9218
#>   0.1841 -0.2091  0.3950  0.6996 -0.1938 -0.3368 -0.0722  0.0966  0.5814
#> 
#> Columns 10 to 18 -0.7900  0.6862  0.3829 -0.4672 -0.5027 -0.0302  0.1479  0.5411  0.8091
#>  -0.2750  0.7744  0.3291 -0.6651 -0.7880 -0.8269  0.6105 -0.0585  0.8671
#>   0.3096 -0.6060 -0.7932 -0.7154  0.0669 -0.2380  0.4638 -0.7161  0.2325
#> 
#> Columns 19 to 20  0.3557 -0.6474
#>   0.4251 -0.3812
#>   0.5579  0.4685
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.3790 -0.3426  0.1700  0.4097  0.5899 -0.0918  0.0235 -0.3307  0.1852
#>   0.7111  0.4291  0.6466 -0.1516  0.5628 -0.4047 -0.5729 -0.6285  0.3345
#>   0.3608  0.3466  0.6241 -0.1323  0.6284 -0.3303  0.2279 -0.2417 -0.3917
#> 
#> Columns 10 to 18 -0.1727 -0.2049 -0.4917 -0.5548 -0.6877  0.1381  0.0294 -0.2666  0.1013
#>  -0.1716  0.5030 -0.6041 -0.2076 -0.3981  0.4964 -0.4218 -0.1944  0.1945
#>   0.1343  0.5193 -0.0805 -0.2428 -0.5561  0.3652  0.1352  0.0685  0.3989
#> 
#> Columns 19 to 20  0.5835  0.0378
#>   0.5256  0.3709
#>  -0.2283 -0.0702
#> 
#> (3,.,.) = 
#>  Columns 1 to 9  0.4857 -0.2846  0.0669  0.4627  0.6227 -0.5060  0.1117 -0.6604  0.3297
#>   0.0060 -0.2215  0.5456  0.0629  0.4363 -0.3732  0.6397 -0.0397  0.3483
#>  -0.2677 -0.4605  0.5429  0.0710  0.1766 -0.5250  0.3277  0.0921  0.3027
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.5733  0.0030 -0.2568 -0.2353 -0.4492  0.6114  0.4159  0.2690  0.2425
#>  -0.2804 -0.6375 -0.2451 -0.4571  0.6715  0.1292  0.8460  0.0487 -0.1788
#>   0.1642  0.1817  0.3654  0.2703  0.2841  0.0954 -0.3233 -0.2613  0.1110
#> 
#> Columns 10 to 18 -0.6024  0.0637  0.3268  0.4739 -0.4387 -0.2510 -0.0992  0.4245 -0.2168
#>  -0.6137  0.8388  0.1329  0.2408  0.3305 -0.1706 -0.1766  0.2808  0.0887
#>   0.4767 -0.1862  0.6478  0.2556  0.7990  0.2032 -0.2167 -0.4625  0.1031
#> 
#> Columns 19 to 20  0.1843 -0.0013
#>  -0.3344 -0.2012
#>  -0.2496  0.1680
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.4788  0.2520  0.6893  0.1731  0.2744 -0.6801  0.1421  0.1439 -0.1920
#>   0.4706 -0.3598  0.4749  0.4556  0.3389 -0.7676  0.0278 -0.0828  0.2422
#>  -0.2397 -0.2356  0.5201  0.0636  0.2354 -0.4438  0.5429  0.0326  0.3797
#> 
#> Columns 10 to 18 -0.2325  0.6530  0.0345 -0.2001 -0.2965  0.4507  0.2269  0.1432  0.2832
#>   0.4363  0.4829  0.0018 -0.3689 -0.3846  0.3125 -0.0474  0.1765  0.0652
#>  -0.0683  0.0523 -0.2209 -0.3107 -0.6251  0.6452 -0.0359 -0.3465  0.2780
#> 
#> Columns 19 to 20  0.1379 -0.1165
#>   0.0557  0.2462
#>   0.2637  0.1195
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>