Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t
, \(x_t\) is
the input at time t
, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
L
represents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.7455 0.3936 0.7036 -0.7209 0.3747 0.8376 -0.1435 0.1569 -0.3096
#> 0.2607 0.1905 0.0967 0.0384 0.3291 0.1909 -0.5733 -0.0478 -0.7249
#> -0.3773 -0.1591 -0.6558 -0.5121 -0.0243 -0.8414 0.7795 -0.6770 -0.2403
#>
#> Columns 10 to 18 0.3965 -0.8558 0.2749 -0.2519 0.1123 0.2284 0.0530 0.0230 0.1163
#> -0.0373 -0.1973 0.5682 -0.7684 -0.5830 0.2068 -0.1461 0.5972 -0.0966
#> -0.7668 0.0178 -0.8780 -0.6888 -0.2894 -0.0332 0.4125 0.1443 0.0043
#>
#> Columns 19 to 20 -0.5481 -0.8050
#> -0.4523 -0.6447
#> -0.5662 0.2916
#>
#> (2,.,.) =
#> Columns 1 to 9 0.7021 0.3962 0.1730 0.0157 0.1096 -0.7210 -0.2206 0.1991 -0.2494
#> 0.3598 0.6098 -0.1604 0.0575 0.1784 -0.5600 -0.4248 -0.2026 -0.6689
#> -0.0424 0.6223 -0.1803 -0.1454 0.1323 0.1302 0.2748 0.0623 -0.1467
#>
#> Columns 10 to 18 0.6750 0.4423 -0.0394 0.1227 -0.3519 0.0200 0.0424 0.5623 -0.2676
#> 0.6808 0.5491 -0.1461 -0.0809 -0.2801 0.3828 0.0884 0.4537 -0.2143
#> 0.4658 0.1921 -0.1459 0.1402 0.3799 -0.4738 0.1954 -0.1347 -0.0131
#>
#> Columns 19 to 20 -0.1553 -0.4013
#> -0.4713 -0.5312
#> -0.4285 -0.2047
#>
#> (3,.,.) =
#> Columns 1 to 9 0.2263 0.2780 0.1459 0.1917 -0.6589 -0.8145 -0.0519 -0.3719 -0.0626
#> 0.2244 -0.1451 -0.7030 -0.0811 -0.4256 -0.5212 0.6773 -0.2766 -0.1071
#> 0.4471 0.2185 0.0320 -0.3334 -0.2179 -0.5710 0.3336 0.1183 0.1767
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.2563 0.1307 -0.6910 0.1910 0.1042 -0.4600 -0.3984 -0.3604 -0.7798
#> 0.2666 0.6401 -0.1285 -0.8123 -0.2548 -0.3054 -0.1305 -0.3466 -0.3573
#> -0.2290 -0.1048 -0.1076 0.7919 0.1959 0.7532 -0.2737 0.3381 -0.8571
#>
#> Columns 10 to 18 0.1114 -0.7475 -0.6247 0.4866 0.5896 -0.7657 0.3730 0.2111 -0.0919
#> 0.5619 0.0921 -0.1832 0.4611 -0.0863 -0.7105 0.0184 0.5867 -0.2630
#> 0.2107 -0.4015 -0.4336 0.4725 0.4456 -0.7413 -0.0817 -0.1985 -0.7257
#>
#> Columns 19 to 20 0.0118 -0.3358
#> 0.1992 0.3393
#> 0.2263 -0.6272
#>
#> (2,.,.) =
#> Columns 1 to 9 0.6570 0.6872 0.2591 -0.0225 0.0223 -0.2467 -0.1958 -0.1411 -0.2049
#> 0.6494 0.2742 0.0539 0.1086 0.0586 -0.5105 0.2676 0.2155 -0.0173
#> 0.1773 0.6330 0.3962 -0.0560 -0.4220 -0.8272 -0.4716 -0.1933 -0.0093
#>
#> Columns 10 to 18 0.7962 0.5407 0.0925 -0.0587 -0.5792 -0.0006 0.4344 -0.2491 -0.0425
#> 0.2025 0.7779 -0.0841 0.5726 -0.4950 -0.2495 0.3615 -0.0169 -0.3604
#> 0.7494 0.2796 0.3781 -0.0473 -0.4167 -0.0868 0.3085 0.1523 -0.1879
#>
#> Columns 19 to 20 -0.5484 -0.5117
#> -0.3170 -0.0604
#> -0.2903 -0.0999
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>