Skip to contents

Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

nonlinearity

The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional RNN. Default: FALSE

...

other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$

where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then \(\mbox{ReLU}\) is used instead of \(\tanh\).

Inputs

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

  • Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and L represents a sequence length.

  • Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)

  • Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch

Attributes

  • weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.0041  0.2095 -0.4141  0.7584 -0.5470 -0.7235  0.8878 -0.0063 -0.5725
#>  -0.4154  0.9757 -0.3711  0.8645 -0.5103  0.6118 -0.0740 -0.3501  0.6530
#>  -0.2476  0.5131 -0.8975 -0.1427 -0.3556  0.7978 -0.9328 -0.2035  0.8613
#> 
#> Columns 10 to 18 -0.0897  0.1856  0.2129  0.4478  0.8005 -0.5169  0.1558  0.6249 -0.0119
#>   0.8280  0.7837  0.2949  0.2668  0.8226 -0.8139  0.8366 -0.4851  0.4219
#>  -0.1822  0.5128 -0.1541 -0.1275 -0.3195  0.4319 -0.0475  0.1846 -0.1177
#> 
#> Columns 19 to 20  0.5096 -0.5260
#>   0.7043  0.2089
#>  -0.1045  0.1081
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.4295  0.2781  0.5380 -0.2599 -0.0597  0.3913 -0.0477 -0.2610  0.4648
#>   0.3667  0.3459 -0.5940 -0.4140 -0.4425  0.7825 -0.5971 -0.1403  0.5328
#>   0.0038  0.5498 -0.5889 -0.2807 -0.4888  0.6547  0.1624 -0.0960  0.0184
#> 
#> Columns 10 to 18  0.0876  0.3148 -0.0268 -0.0628  0.3917 -0.5829  0.0215  0.0684 -0.2050
#>  -0.0757 -0.0758 -0.6377 -0.1882  0.0236 -0.4869  0.3140 -0.2914 -0.6599
#>  -0.0439  0.0555 -0.2933 -0.3313 -0.3724  0.2435  0.4096  0.6152 -0.7066
#> 
#> Columns 19 to 20 -0.4061  0.5851
#>  -0.2419 -0.0355
#>   0.0737 -0.5461
#> 
#> (3,.,.) = 
#>  Columns 1 to 9  0.3784 -0.1948 -0.3001 -0.3820 -0.0654 -0.0552  0.0874 -0.6842  0.2488
#>   0.0259  0.3266 -0.2174 -0.2542 -0.3447  0.8912  0.0535  0.0385  0.2697
#>   0.0221  0.6005 -0.6354  0.4639 -0.6273  0.5738 -0.0110 -0.2484  0.2332
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.0344  0.2166  0.0263  0.0747 -0.3544  0.4259  0.3048  0.2853 -0.3600
#>   0.2160 -0.0160 -0.2488 -0.5330 -0.3135 -0.1380  0.6910 -0.2052  0.1497
#>   0.3476 -0.5181 -0.4609 -0.5661 -0.4720  0.2220 -0.2130 -0.5415 -0.0655
#> 
#> Columns 10 to 18 -0.5437  0.4391 -0.1266 -0.6935  0.2304 -0.1158  0.0866  0.2069 -0.4669
#>  -0.0816  0.6534 -0.1972 -0.4021  0.1592  0.3782 -0.3698 -0.0884 -0.3004
#>  -0.1405  0.5343 -0.2581 -0.8582  0.3629 -0.6251 -0.4619  0.1403 -0.3096
#> 
#> Columns 19 to 20 -0.2407 -0.2154
#>   0.1458  0.5957
#>  -0.3155 -0.7977
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.5116  0.2050 -0.1882  0.1845 -0.1332  0.2329 -0.0817 -0.5848  0.5571
#>   0.1684 -0.1583 -0.2016  0.1267 -0.1420 -0.0922  0.4318 -0.5668  0.6530
#>   0.2251  0.1015 -0.4805 -0.2975 -0.1839  0.6501 -0.0226 -0.3916  0.3801
#> 
#> Columns 10 to 18  0.0117  0.0645 -0.2176  0.0822  0.6376 -0.0607  0.0442  0.4990 -0.3012
#>   0.1699 -0.1989  0.1334 -0.1286  0.5287  0.1411  0.2317  0.6409 -0.0845
#>  -0.3261  0.1579 -0.0895 -0.2588  0.1015  0.3850  0.2858  0.5040 -0.5516
#> 
#> Columns 19 to 20 -0.0107 -0.0776
#>  -0.2223 -0.0673
#>   0.1167 -0.3942
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>