RNN module — nn_rnn • torch

Applies a multi-layer Elman RNN with $\tanh$ or $\mbox{ReLU}$ non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size: The number of expected features in the input x
hidden_size: The number of features in the hidden state h
num_layers: Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'
bias: If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE
batch_first: If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE
dropout: If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0
bidirectional: If TRUE, becomes a bidirectional RNN. Default: FALSE
...: other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$

where $h_t$ is the hidden state at time t, $x_t$ is the input at time t, and $h_{(t-1)}$ is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then $\mbox{ReLU}$ is used instead of $\tanh$.

Inputs

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

Input1: $(L, N, H_{in})$ tensor containing input features where $H_{in}=\mbox{input\_size}$ and L represents a sequence length.
Input2: $(S, N, H_{out})$ tensor containing the initial hidden state for each element in the batch. $H_{out}=\mbox{hidden\_size}$ Defaults to zero if not provided. where $S=\mbox{num\_layers} * \mbox{num\_directions}$ If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: $(L, N, H_{all})$ where $H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}$
Output2: $(S, N, H_{out})$ tensor containing the next hidden state for each element in the batch

Attributes

weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)
weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)
bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)
bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from $\mathcal{U}(-\sqrt{k}, \sqrt{k})$ where $k = \frac{1}{\mbox{hidden\_size}}$

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.6194  0.1261 -0.5859 -0.5130  0.2639  0.2436  0.7532 -0.1778 -0.1698
#>  -0.7915 -0.2606  0.5423 -0.2332 -0.5692  0.0797 -0.3458 -0.2994 -0.4752
#>   0.7498 -0.1321 -0.2842  0.4756 -0.1615  0.2806  0.3849  0.2648 -0.1627
#> 
#> Columns 10 to 18 -0.4432 -0.2098 -0.7064 -0.0722 -0.5759 -0.7316  0.3305 -0.2668  0.0912
#>   0.3085 -0.2419  0.3502  0.3794 -0.2719  0.7153  0.0670  0.6914  0.6160
#>  -0.0390  0.7705  0.9537  0.7160 -0.5791  0.8475 -0.8662  0.0169 -0.3920
#> 
#> Columns 19 to 20 -0.1107  0.3902
#>   0.4782 -0.1679
#>  -0.9254  0.8202
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.2733  0.5807 -0.5705 -0.2137 -0.1263  0.2720  0.6784  0.3367  0.2812
#>  -0.2298  0.4555 -0.4291  0.0732  0.4738 -0.2003  0.2658  0.3731 -0.0465
#>  -0.2809  0.1940 -0.0990  0.2505 -0.0127 -0.4197  0.1765  0.8446 -0.0952
#> 
#> Columns 10 to 18  0.2151  0.3480  0.2816  0.6208 -0.4694 -0.2574 -0.3221  0.7333 -0.4520
#>  -0.2748 -0.2761 -0.7137  0.4835 -0.1636 -0.6345 -0.0590  0.2484  0.0123
#>  -0.5653  0.5783 -0.5571  0.2605  0.1348 -0.0049  0.2231  0.4494  0.2747
#> 
#> Columns 19 to 20  0.4467  0.2031
#>  -0.3049  0.3067
#>  -0.0243  0.4457
#> 
#> (3,.,.) = 
#>  Columns 1 to 9 -0.4587  0.5419 -0.3534 -0.4045 -0.4276 -0.2290  0.0918  0.5118 -0.5355
#>  -0.1085  0.3107 -0.3817 -0.3413 -0.4206 -0.2209  0.3014  0.1557 -0.1130
#>   0.2597  0.2839 -0.4300  0.1852  0.0349  0.0473  0.3646 -0.3881  0.4646
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.0086 -0.2831 -0.5776  0.0057 -0.0709  0.6271  0.5970  0.2080  0.1378
#>   0.0209 -0.5796 -0.3668  0.0877 -0.3049  0.4987  0.4293 -0.7983 -0.5304
#>   0.2519 -0.3285  0.2158 -0.1921  0.4215  0.2205  0.2465 -0.4121 -0.2367
#> 
#> Columns 10 to 18  0.8283  0.0237  0.4405  0.3959  0.7915 -0.3492 -0.5114  0.2015  0.4390
#>   0.4698 -0.1859  0.3392  0.6283  0.7378  0.7038  0.3066  0.7614 -0.0596
#>  -0.0976  0.1275  0.5496  0.4558 -0.0012  0.2623  0.1556 -0.0674 -0.4739
#> 
#> Columns 19 to 20  0.4836  0.6168
#>  -0.4734  0.4156
#>  -0.2577  0.4979
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.1114  0.3486 -0.4200 -0.1580 -0.5198 -0.3118  0.2938  0.5197 -0.0018
#>   0.2732  0.4590 -0.4445  0.2851 -0.4892  0.0292  0.2732  0.1335  0.0589
#>   0.3064  0.3742 -0.6699 -0.2006 -0.5265 -0.1959  0.4728  0.1926 -0.0590
#> 
#> Columns 10 to 18  0.0210  0.3693 -0.2735  0.4198 -0.6618 -0.2076 -0.0963  0.3921 -0.5134
#>   0.1978  0.4044 -0.1352  0.3989 -0.5551 -0.5138  0.0447  0.2919 -0.1836
#>  -0.1193 -0.0102  0.0760  0.4765 -0.0831 -0.3886  0.2522  0.4670 -0.1084
#> 
#> Columns 19 to 20 -0.0799  0.3596
#>   0.2372  0.5238
#>   0.2247  0.4753
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>