Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)Arguments
- input_size
The number of expected features in the input
xThe number of features in the hidden state
h- num_layers
Number of recurrent layers. E.g., setting
num_layers=2would mean stacking two RNNs together to form astacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'or'relu'. Default:'tanh'- bias
If
FALSE, then the layer does not use bias weightsb_ihandb_hh. Default:TRUE- batch_first
If
TRUE, then the input and output tensors are provided as(batch, seq, feature). Default:FALSE- dropout
If non-zero, introduces a
Dropoutlayer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout. Default: 0- bidirectional
If
TRUE, becomes a bidirectional RNN. Default:FALSE- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t, \(x_t\) is
the input at time t, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1 or the initial hidden state at time 0.
If nonlinearity is 'relu', then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for eacht. If a :class:nn_packed_sequencehas been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction0and1respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size): tensor containing the hidden state fort = seq_len. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size).
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
Lrepresents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)fork = 0. Otherwise, the shape is(hidden_size, num_directions * hidden_size)weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.5336 0.7385 -0.1702 0.8946 0.5489 0.4577 -0.6100 0.3837 -0.2113
#> 0.5149 0.5170 -0.3932 0.2605 -0.7742 0.6378 -0.8764 -0.6030 0.6703
#> -0.6465 -0.7565 0.2658 -0.4789 -0.6121 -0.7822 -0.6875 -0.2040 0.8524
#>
#> Columns 10 to 18 -0.4313 -0.3112 0.5503 -0.4635 0.4659 0.4330 -0.4477 -0.4642 -0.1739
#> 0.6729 0.3674 -0.7854 -0.3010 -0.0329 0.6652 -0.6028 -0.3420 -0.7138
#> -0.4576 0.4652 -0.1436 -0.8748 -0.3997 0.5135 0.5969 -0.5472 -0.7552
#>
#> Columns 19 to 20 -0.1085 -0.8933
#> 0.2050 -0.7869
#> 0.4866 -0.8155
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.3378 0.5512 -0.6012 -0.1774 0.0709 0.3978 -0.5676 0.7610 0.2318
#> -0.6532 -0.2849 -0.2463 0.1308 -0.3581 -0.2793 -0.3553 -0.0813 0.5263
#> -0.6874 0.0305 0.5229 0.5767 0.3622 -0.3725 0.2080 0.0476 0.2523
#>
#> Columns 10 to 18 -0.0127 0.5236 -0.1666 -0.2486 -0.4238 0.1061 0.1221 0.0427 0.0751
#> -0.4802 -0.3173 0.0086 -0.3551 0.0449 0.5629 -0.2253 -0.8216 -0.3838
#> 0.1921 -0.2343 0.0920 -0.5654 -0.2106 0.2449 -0.4299 -0.2227 -0.1978
#>
#> Columns 19 to 20 -0.0165 -0.5769
#> -0.5177 0.0348
#> -0.5863 0.0111
#>
#> (3,.,.) =
#> Columns 1 to 9 -0.3301 -0.0342 -0.2219 0.5956 -0.2213 -0.2874 0.2217 -0.2680 0.3362
#> -0.7505 -0.0707 0.3008 0.5669 -0.2736 -0.5126 0.5113 -0.0566 0.4427
#> -0.3840 0.2430 0.4328 0.5987 -0.1425 -0.2271 -0.1556 0.2931 0.0870
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.4746 0.2448 0.2263 -0.3123 0.1691 -0.0862 -0.2115 0.1364 0.3606
#> -0.0427 -0.2690 -0.1439 0.1881 0.3559 -0.7563 -0.7234 -0.3362 0.3827
#> -0.3051 -0.2255 0.5426 -0.4946 0.3450 -0.2787 0.1788 0.2281 0.6788
#>
#> Columns 10 to 18 -0.0465 -0.3255 -0.0975 0.1391 0.5480 -0.2833 0.0816 -0.5084 0.0402
#> 0.2749 -0.0250 -0.6497 0.3652 -0.4857 -0.3534 0.0736 -0.4114 -0.6087
#> -0.3513 -0.2443 0.0887 0.4410 -0.5391 0.2359 0.8310 0.4941 0.7023
#>
#> Columns 19 to 20 -0.0271 0.0505
#> -0.3850 0.5837
#> -0.9213 -0.6088
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.2549 0.1756 0.2124 0.2662 -0.0108 -0.1531 -0.3832 0.1307 0.3121
#> 0.2063 0.3140 -0.0864 0.2344 0.1531 -0.0108 -0.2327 0.1950 0.3488
#> 0.0135 -0.0336 0.2284 0.6409 -0.0711 -0.2831 0.3964 -0.1305 0.6077
#>
#> Columns 10 to 18 0.2224 0.1722 -0.4920 -0.1563 -0.0371 -0.3329 0.0425 -0.1240 -0.3245
#> -0.4126 -0.1157 -0.4409 -0.0087 0.2259 0.0846 -0.2070 -0.5810 -0.1939
#> 0.2366 0.2339 -0.3316 -0.6490 -0.1148 0.1552 0.2948 0.0804 0.2170
#>
#> Columns 19 to 20 0.0610 -0.3165
#> -0.1835 -0.5385
#> 0.0863 -0.3652
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>