RNN module — nn_rnn • torch

Applies a multi-layer Elman RNN with $\tanh$ or $\mbox{ReLU}$ non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size: The number of expected features in the input x
hidden_size: The number of features in the hidden state h
num_layers: Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'
bias: If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE
batch_first: If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE
dropout: If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0
bidirectional: If TRUE, becomes a bidirectional RNN. Default: FALSE
...: other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$

where $h_t$ is the hidden state at time t, $x_t$ is the input at time t, and $h_{(t-1)}$ is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then $\mbox{ReLU}$ is used instead of $\tanh$.

Inputs

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

Input1: $(L, N, H_{in})$ tensor containing input features where $H_{in}=\mbox{input\_size}$ and L represents a sequence length.
Input2: $(S, N, H_{out})$ tensor containing the initial hidden state for each element in the batch. $H_{out}=\mbox{hidden\_size}$ Defaults to zero if not provided. where $S=\mbox{num\_layers} * \mbox{num\_directions}$ If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: $(L, N, H_{all})$ where $H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}$
Output2: $(S, N, H_{out})$ tensor containing the next hidden state for each element in the batch

Attributes

weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)
weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)
bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)
bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from $\mathcal{U}(-\sqrt{k}, \sqrt{k})$ where $k = \frac{1}{\mbox{hidden\_size}}$

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.6672 -0.5576 -0.7882 -0.5704 -0.1213 -0.1555 -0.4988 -0.2041  0.1730
#>   0.3708  0.4823 -0.3389  0.0649  0.0870  0.3918  0.2740 -0.1167  0.6356
#>  -0.2963  0.2256 -0.8713 -0.4988 -0.4066 -0.6238 -0.5727 -0.8091  0.0541
#> 
#> Columns 10 to 18  0.4516  0.2719 -0.3985 -0.9430 -0.7006  0.1088 -0.3734 -0.2566  0.3061
#>   0.3111 -0.9668 -0.3981 -0.5069  0.7400 -0.8135 -0.3103 -0.2433 -0.0326
#>   0.1348  0.4815  0.8166  0.1176 -0.6106 -0.6241 -0.0798 -0.2404  0.3189
#> 
#> Columns 19 to 20 -0.6957 -0.3317
#>  -0.5450 -0.1157
#>  -0.6601 -0.0846
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.6531 -0.1276 -0.4058 -0.2371 -0.1864 -0.0673 -0.1344 -0.0563 -0.0762
#>   0.3822  0.7262  0.2071 -0.0514 -0.3563 -0.0384 -0.3935 -0.2740  0.5408
#>   0.7960  0.0880 -0.0993 -0.2839 -0.1842 -0.4034  0.0735 -0.0556 -0.1229
#> 
#> Columns 10 to 18  0.0177  0.0306 -0.2214 -0.0196  0.1127 -0.3426 -0.4222 -0.2350  0.0771
#>   0.2666 -0.6826  0.0378  0.0231 -0.0792 -0.7421 -0.2437 -0.2958  0.6728
#>   0.4232  0.3214 -0.5402 -0.0659 -0.0868 -0.7073 -0.5748 -0.5727  0.2702
#> 
#> Columns 19 to 20 -0.5377  0.5466
#>  -0.4177  0.5886
#>   0.1673  0.3668
#> 
#> (3,.,.) = 
#>  Columns 1 to 9  0.4817  0.3553  0.0310  0.2629 -0.2499 -0.3908  0.2113 -0.1869  0.3778
#>   0.0696 -0.0065  0.0965 -0.0875 -0.1411 -0.2435 -0.0177  0.2115  0.8363
#>   0.4694  0.2319 -0.1121 -0.0015 -0.2128 -0.5588  0.0025  0.0655  0.5888
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.3043 -0.2065  0.3745 -0.3587  0.4909  0.1480 -0.0934 -0.0568 -0.5119
#>  -0.1698 -0.4295  0.1618  0.8840  0.8403  0.1521 -0.0194  0.0346 -0.5030
#>  -0.5914 -0.2615  0.0181  0.3092  0.0269 -0.2392  0.3054 -0.1592 -0.2449
#> 
#> Columns 10 to 18 -0.2044  0.1393  0.0223 -0.6050  0.0864 -0.2030 -0.1884  0.1294 -0.5688
#>  -0.7192  0.2503 -0.4436 -0.5366  0.3780 -0.7444 -0.3009  0.0491 -0.6804
#>  -0.7611  0.5710 -0.3136 -0.6655  0.5523 -0.2799  0.0534 -0.0332 -0.3585
#> 
#> Columns 19 to 20  0.1900 -0.1943
#>   0.4012  0.5662
#>   0.1809 -0.0002
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.2490  0.1279  0.0560 -0.0352 -0.2997 -0.3424  0.1478  0.0832  0.5699
#>   0.2878 -0.2722  0.3503  0.0755 -0.2243 -0.2110 -0.0385  0.2922  0.6630
#>   0.1584 -0.0420 -0.1141 -0.1354 -0.2968 -0.3121 -0.0339  0.1752  0.3961
#> 
#> Columns 10 to 18  0.5766 -0.4965  0.0478  0.0154  0.0152 -0.4690 -0.5169 -0.3258 -0.1789
#>   0.5336 -0.7860  0.3072  0.5158  0.2224 -0.5046 -0.2176 -0.1864  0.0163
#>   0.4292 -0.7345  0.3020  0.3357  0.2845 -0.4391  0.0123 -0.2316 -0.1739
#> 
#> Columns 19 to 20  0.2999  0.2000
#>  -0.0739  0.1988
#>   0.0014  0.0869
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>