Skip to contents

Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

nonlinearity

The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional RNN. Default: FALSE

...

other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$

where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then \(\mbox{ReLU}\) is used instead of \(\tanh\).

Inputs

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

  • Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and L represents a sequence length.

  • Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)

  • Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch

Attributes

  • weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.8040 -0.3092 -0.0555  0.2335 -0.2217 -0.5908  0.2679  0.3845  0.0937
#>  -0.2382  0.5811  0.6305  0.0844 -0.4830 -0.2957  0.0956  0.9263  0.4017
#>  -0.4306 -0.3961  0.6024  0.4475  0.1841 -0.6021 -0.5018  0.3913 -0.7079
#> 
#> Columns 10 to 18 -0.5059 -0.6756  0.8349  0.4919 -0.4339 -0.8934  0.2698 -0.2857 -0.4519
#>  -0.8507 -0.0182  0.2623 -0.3637 -0.2642  0.2595  0.0666  0.7768  0.1554
#>  -0.9662 -0.0063 -0.6959 -0.4043 -0.0955  0.9586 -0.6057  0.0511  0.6845
#> 
#> Columns 19 to 20  0.7976  0.1734
#>   0.6827 -0.1776
#>  -0.7289 -0.7782
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.3889 -0.3601 -0.0187 -0.2968 -0.0215  0.0275 -0.0976 -0.6676  0.4277
#>  -0.8194  0.7075 -0.1063 -0.4142  0.4080 -0.4120  0.0666  0.3513  0.0828
#>  -0.3121  0.2380  0.0845 -0.3078  0.1855 -0.1288  0.5471  0.0283  0.2182
#> 
#> Columns 10 to 18 -0.4533 -0.4032  0.5500  0.3453  0.1722  0.1118  0.1523 -0.4451 -0.5157
#>  -0.3841  0.3041  0.2955 -0.3555  0.3039  0.3432  0.4212  0.2755 -0.1093
#>  -0.1794  0.1411  0.4756 -0.0663  0.7400  0.2700  0.3029  0.5518  0.1165
#> 
#> Columns 19 to 20 -0.2847 -0.0894
#>   0.0462 -0.1315
#>  -0.1700 -0.5845
#> 
#> (3,.,.) = 
#>  Columns 1 to 9 -0.0975  0.1631 -0.0524 -0.2066 -0.5398  0.0839 -0.2147 -0.1502  0.4560
#>  -0.2505  0.3777 -0.1579 -0.5296 -0.0623  0.2707  0.3289  0.2627  0.5263
#>  -0.7112  0.1882 -0.4830 -0.2793 -0.2617  0.2113  0.1065  0.3081  0.4845
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.5894 -0.8975 -0.1316 -0.1465 -0.3198  0.4174 -0.1543 -0.7180  0.3137
#>   0.1829 -0.3649 -0.2477 -0.2714 -0.3783 -0.3590 -0.3948 -0.0928  0.3611
#>  -0.4389 -0.2385  0.5147 -0.6942 -0.1651  0.1956 -0.0757 -0.4346  0.1995
#> 
#> Columns 10 to 18 -0.5749 -0.2482  0.7138  0.5591 -0.0564 -0.2561 -0.7119 -0.2263 -0.1713
#>   0.5589  0.4213  0.1974  0.3019 -0.4558  0.4529  0.0645  0.3129  0.0558
#>  -0.1634  0.0046 -0.1113 -0.4117 -0.2508  0.2835  0.1263 -0.1403  0.2300
#> 
#> Columns 19 to 20 -0.0693 -0.4724
#>   0.0856 -0.1832
#>  -0.2027  0.3731
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.0995  0.4661  0.0508 -0.1046 -0.6234  0.0702 -0.2974  0.0622  0.1838
#>  -0.3789  0.2820 -0.2073 -0.2903 -0.1876  0.2477  0.2295 -0.3757  0.2671
#>  -0.0732  0.4498 -0.3874 -0.3978 -0.2299  0.2338  0.1490 -0.2620  0.6278
#> 
#> Columns 10 to 18 -0.4097 -0.0530  0.4075 -0.3532 -0.2414 -0.1718  0.1625  0.0727 -0.1397
#>  -0.0966 -0.0552  0.3717 -0.0748  0.3183  0.0857  0.1956 -0.0692 -0.4201
#>  -0.0819 -0.3396  0.5400  0.0478  0.1245 -0.1068  0.0449  0.1172 -0.4559
#> 
#> Columns 19 to 20  0.0983  0.2485
#>   0.0664 -0.0674
#>   0.3425  0.2600
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>