RNN module — nn_rnn • torch

Applies a multi-layer Elman RNN with $\tanh$ or $ReLU$ non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size: The number of expected features in the input x
hidden_size: The number of features in the hidden state h
num_layers: Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
nonlinearity: The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'
bias: If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE
batch_first: If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE
dropout: If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0
bidirectional: If TRUE, becomes a bidirectional RNN. Default: FALSE
...: other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

$h_{t} = \tanh (W_{i h} x_{t} + b_{i h} + W_{h h} h_{(t - 1)} + b_{h h})$

where $h_{t}$ is the hidden state at time t, $x_{t}$ is the input at time t, and $h_{(t - 1)}$ is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then $ReLU$ is used instead of $\tanh$ .

Inputs

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

Input1: $(L, N, H_{i n})$ tensor containing input features where $H_{i n} = input\_size$ and L represents a sequence length.
Input2: $(S, N, H_{o u t})$ tensor containing the initial hidden state for each element in the batch. $H_{o u t} = hidden\_size$ Defaults to zero if not provided. where $S = num\_layers * num\_directions$ If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: $(L, N, H_{a l l})$ where $H_{a l l} = num\_directions * hidden\_size$
Output2: $(S, N, H_{o u t})$ tensor containing the next hidden state for each element in the batch

Attributes

weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)
weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)
bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)
bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from $U (- \sqrt{k}, \sqrt{k})$ where $k = \frac{1}{hidden\_size}$

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.1138  0.6631 -0.1918  0.6281  0.2510  0.5953 -0.9136 -0.0814 -0.2429
#>  -0.1266  0.7703  0.2245 -0.4914  0.2877 -0.6799  0.8433  0.2830  0.2680
#>  -0.0520  0.8430 -0.0752  0.3551 -0.2969 -0.2672 -0.6215 -0.7958 -0.1652
#> 
#> Columns 10 to 18 -0.3164  0.0610 -0.0020 -0.3274  0.4744 -0.2470 -0.2928  0.7699 -0.3805
#>  -0.4374  0.6248  0.1377  0.3988  0.7886 -0.1949 -0.5301 -0.0371 -0.8580
#>   0.6870  0.8425  0.1067 -0.1855  0.2651  0.5158  0.6275  0.8245 -0.4980
#> 
#> Columns 19 to 20 -0.8695  0.5809
#>   0.2636  0.6554
#>  -0.1446  0.4440
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.6179  0.2517  0.2674  0.0204  0.2723 -0.3134 -0.1389 -0.0190  0.1570
#>  -0.1340  0.2771  0.1483 -0.2596  0.4032 -0.1611  0.1238 -0.1764 -0.0698
#>  -0.5917  0.4475 -0.2211 -0.2794  0.0867  0.1524 -0.0418  0.3546  0.1035
#> 
#> Columns 10 to 18 -0.0817  0.0739 -0.1066 -0.1188  0.0669  0.3599  0.4410  0.1185 -0.2348
#>   0.6463  0.7312  0.0089 -0.4626 -0.2593  0.0311  0.2613  0.3616 -0.1381
#>   0.1381 -0.4526 -0.7709  0.1242 -0.1917 -0.1203  0.5128 -0.5166  0.3311
#> 
#> Columns 19 to 20 -0.5298  0.1405
#>   0.0107  0.2482
#>  -0.3337  0.0425
#> 
#> (3,.,.) = 
#>  Columns 1 to 9 -0.0461 -0.0263  0.0212  0.1693 -0.4990 -0.1146 -0.3431  0.4728 -0.1178
#>  -0.2473 -0.1363  0.0523  0.1693  0.0459 -0.2329 -0.6231  0.4409 -0.4793
#>  -0.2879  0.2178 -0.0925  0.3525  0.2685  0.0030 -0.3820 -0.2428 -0.0663
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.2696  0.0398  0.4615  0.3071 -0.5003  0.2484 -0.3660  0.0948  0.0016
#>  -0.3122 -0.2843 -0.2927  0.1812  0.2122  0.5493  0.1288  0.4333 -0.3808
#>  -0.9319  0.2868 -0.4655  0.0162 -0.0315  0.0888 -0.2612 -0.1192 -0.0668
#> 
#> Columns 10 to 18 -0.3016 -0.0983  0.2063  0.6766  0.1090  0.0025  0.1229 -0.0869 -0.3359
#>  -0.3524 -0.0248  0.3073  0.6688  0.3359  0.7473  0.1766  0.3498 -0.5443
#>   0.2575  0.4057  0.6756 -0.3496 -0.1808 -0.6011 -0.2010 -0.4891 -0.5971
#> 
#> Columns 19 to 20  0.5769 -0.4665
#>   0.3955 -0.1531
#>  -0.0789 -0.3875
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.3803  0.4090  0.4323  0.1755 -0.2874 -0.1086 -0.2093  0.0970 -0.1678
#>  -0.2449  0.2651  0.3549  0.1676  0.0407  0.0302 -0.3808 -0.1554 -0.1772
#>   0.4547  0.0682 -0.2861 -0.1158 -0.2140 -0.0062 -0.4872 -0.4629 -0.5972
#> 
#> Columns 10 to 18  0.3575 -0.0950 -0.3082  0.0546 -0.3701 -0.3337  0.3244 -0.3924 -0.1071
#>   0.3914 -0.1311 -0.0138 -0.2266  0.3266  0.3640  0.4405  0.1752 -0.1293
#>   0.4631 -0.1551 -0.1552 -0.2861  0.4154  0.5296  0.4948  0.4903  0.3286
#> 
#> Columns 19 to 20 -0.0363  0.0573
#>  -0.1814 -0.1200
#>  -0.6555  0.3351
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>