Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
- hidden_size
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t
, \(x_t\) is
the input at time t
, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
L
represents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.6672 -0.5576 -0.7882 -0.5704 -0.1213 -0.1555 -0.4988 -0.2041 0.1730
#> 0.3708 0.4823 -0.3389 0.0649 0.0870 0.3918 0.2740 -0.1167 0.6356
#> -0.2963 0.2256 -0.8713 -0.4988 -0.4066 -0.6238 -0.5727 -0.8091 0.0541
#>
#> Columns 10 to 18 0.4516 0.2719 -0.3985 -0.9430 -0.7006 0.1088 -0.3734 -0.2566 0.3061
#> 0.3111 -0.9668 -0.3981 -0.5069 0.7400 -0.8135 -0.3103 -0.2433 -0.0326
#> 0.1348 0.4815 0.8166 0.1176 -0.6106 -0.6241 -0.0798 -0.2404 0.3189
#>
#> Columns 19 to 20 -0.6957 -0.3317
#> -0.5450 -0.1157
#> -0.6601 -0.0846
#>
#> (2,.,.) =
#> Columns 1 to 9 0.6531 -0.1276 -0.4058 -0.2371 -0.1864 -0.0673 -0.1344 -0.0563 -0.0762
#> 0.3822 0.7262 0.2071 -0.0514 -0.3563 -0.0384 -0.3935 -0.2740 0.5408
#> 0.7960 0.0880 -0.0993 -0.2839 -0.1842 -0.4034 0.0735 -0.0556 -0.1229
#>
#> Columns 10 to 18 0.0177 0.0306 -0.2214 -0.0196 0.1127 -0.3426 -0.4222 -0.2350 0.0771
#> 0.2666 -0.6826 0.0378 0.0231 -0.0792 -0.7421 -0.2437 -0.2958 0.6728
#> 0.4232 0.3214 -0.5402 -0.0659 -0.0868 -0.7073 -0.5748 -0.5727 0.2702
#>
#> Columns 19 to 20 -0.5377 0.5466
#> -0.4177 0.5886
#> 0.1673 0.3668
#>
#> (3,.,.) =
#> Columns 1 to 9 0.4817 0.3553 0.0310 0.2629 -0.2499 -0.3908 0.2113 -0.1869 0.3778
#> 0.0696 -0.0065 0.0965 -0.0875 -0.1411 -0.2435 -0.0177 0.2115 0.8363
#> 0.4694 0.2319 -0.1121 -0.0015 -0.2128 -0.5588 0.0025 0.0655 0.5888
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.3043 -0.2065 0.3745 -0.3587 0.4909 0.1480 -0.0934 -0.0568 -0.5119
#> -0.1698 -0.4295 0.1618 0.8840 0.8403 0.1521 -0.0194 0.0346 -0.5030
#> -0.5914 -0.2615 0.0181 0.3092 0.0269 -0.2392 0.3054 -0.1592 -0.2449
#>
#> Columns 10 to 18 -0.2044 0.1393 0.0223 -0.6050 0.0864 -0.2030 -0.1884 0.1294 -0.5688
#> -0.7192 0.2503 -0.4436 -0.5366 0.3780 -0.7444 -0.3009 0.0491 -0.6804
#> -0.7611 0.5710 -0.3136 -0.6655 0.5523 -0.2799 0.0534 -0.0332 -0.3585
#>
#> Columns 19 to 20 0.1900 -0.1943
#> 0.4012 0.5662
#> 0.1809 -0.0002
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.2490 0.1279 0.0560 -0.0352 -0.2997 -0.3424 0.1478 0.0832 0.5699
#> 0.2878 -0.2722 0.3503 0.0755 -0.2243 -0.2110 -0.0385 0.2922 0.6630
#> 0.1584 -0.0420 -0.1141 -0.1354 -0.2968 -0.3121 -0.0339 0.1752 0.3961
#>
#> Columns 10 to 18 0.5766 -0.4965 0.0478 0.0154 0.0152 -0.4690 -0.5169 -0.3258 -0.1789
#> 0.5336 -0.7860 0.3072 0.5158 0.2224 -0.5046 -0.2176 -0.1864 0.0163
#> 0.4292 -0.7345 0.3020 0.3357 0.2845 -0.4391 0.0123 -0.2316 -0.1739
#>
#> Columns 19 to 20 0.2999 0.2000
#> -0.0739 0.1988
#> 0.0014 0.0869
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>