Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t
, \(x_t\) is
the input at time t
, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
L
represents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.6210 -0.0224 0.0897 -0.1271 0.3904 0.4547 0.0283 -0.2772 -0.6163
#> -0.1755 0.4043 0.3834 0.8691 0.3629 0.2580 -0.7042 0.7463 -0.9218
#> 0.1841 -0.2091 0.3950 0.6996 -0.1938 -0.3368 -0.0722 0.0966 0.5814
#>
#> Columns 10 to 18 -0.7900 0.6862 0.3829 -0.4672 -0.5027 -0.0302 0.1479 0.5411 0.8091
#> -0.2750 0.7744 0.3291 -0.6651 -0.7880 -0.8269 0.6105 -0.0585 0.8671
#> 0.3096 -0.6060 -0.7932 -0.7154 0.0669 -0.2380 0.4638 -0.7161 0.2325
#>
#> Columns 19 to 20 0.3557 -0.6474
#> 0.4251 -0.3812
#> 0.5579 0.4685
#>
#> (2,.,.) =
#> Columns 1 to 9 0.3790 -0.3426 0.1700 0.4097 0.5899 -0.0918 0.0235 -0.3307 0.1852
#> 0.7111 0.4291 0.6466 -0.1516 0.5628 -0.4047 -0.5729 -0.6285 0.3345
#> 0.3608 0.3466 0.6241 -0.1323 0.6284 -0.3303 0.2279 -0.2417 -0.3917
#>
#> Columns 10 to 18 -0.1727 -0.2049 -0.4917 -0.5548 -0.6877 0.1381 0.0294 -0.2666 0.1013
#> -0.1716 0.5030 -0.6041 -0.2076 -0.3981 0.4964 -0.4218 -0.1944 0.1945
#> 0.1343 0.5193 -0.0805 -0.2428 -0.5561 0.3652 0.1352 0.0685 0.3989
#>
#> Columns 19 to 20 0.5835 0.0378
#> 0.5256 0.3709
#> -0.2283 -0.0702
#>
#> (3,.,.) =
#> Columns 1 to 9 0.4857 -0.2846 0.0669 0.4627 0.6227 -0.5060 0.1117 -0.6604 0.3297
#> 0.0060 -0.2215 0.5456 0.0629 0.4363 -0.3732 0.6397 -0.0397 0.3483
#> -0.2677 -0.4605 0.5429 0.0710 0.1766 -0.5250 0.3277 0.0921 0.3027
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.5733 0.0030 -0.2568 -0.2353 -0.4492 0.6114 0.4159 0.2690 0.2425
#> -0.2804 -0.6375 -0.2451 -0.4571 0.6715 0.1292 0.8460 0.0487 -0.1788
#> 0.1642 0.1817 0.3654 0.2703 0.2841 0.0954 -0.3233 -0.2613 0.1110
#>
#> Columns 10 to 18 -0.6024 0.0637 0.3268 0.4739 -0.4387 -0.2510 -0.0992 0.4245 -0.2168
#> -0.6137 0.8388 0.1329 0.2408 0.3305 -0.1706 -0.1766 0.2808 0.0887
#> 0.4767 -0.1862 0.6478 0.2556 0.7990 0.2032 -0.2167 -0.4625 0.1031
#>
#> Columns 19 to 20 0.1843 -0.0013
#> -0.3344 -0.2012
#> -0.2496 0.1680
#>
#> (2,.,.) =
#> Columns 1 to 9 0.4788 0.2520 0.6893 0.1731 0.2744 -0.6801 0.1421 0.1439 -0.1920
#> 0.4706 -0.3598 0.4749 0.4556 0.3389 -0.7676 0.0278 -0.0828 0.2422
#> -0.2397 -0.2356 0.5201 0.0636 0.2354 -0.4438 0.5429 0.0326 0.3797
#>
#> Columns 10 to 18 -0.2325 0.6530 0.0345 -0.2001 -0.2965 0.4507 0.2269 0.1432 0.2832
#> 0.4363 0.4829 0.0018 -0.3689 -0.3846 0.3125 -0.0474 0.1765 0.0652
#> -0.0683 0.0523 -0.2209 -0.3107 -0.6251 0.6452 -0.0359 -0.3465 0.2780
#>
#> Columns 19 to 20 0.1379 -0.1165
#> 0.0557 0.2462
#> 0.2637 0.1195
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>