Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
- hidden_size
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t
, \(x_t\) is
the input at time t
, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
L
represents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.6194 0.1261 -0.5859 -0.5130 0.2639 0.2436 0.7532 -0.1778 -0.1698
#> -0.7915 -0.2606 0.5423 -0.2332 -0.5692 0.0797 -0.3458 -0.2994 -0.4752
#> 0.7498 -0.1321 -0.2842 0.4756 -0.1615 0.2806 0.3849 0.2648 -0.1627
#>
#> Columns 10 to 18 -0.4432 -0.2098 -0.7064 -0.0722 -0.5759 -0.7316 0.3305 -0.2668 0.0912
#> 0.3085 -0.2419 0.3502 0.3794 -0.2719 0.7153 0.0670 0.6914 0.6160
#> -0.0390 0.7705 0.9537 0.7160 -0.5791 0.8475 -0.8662 0.0169 -0.3920
#>
#> Columns 19 to 20 -0.1107 0.3902
#> 0.4782 -0.1679
#> -0.9254 0.8202
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.2733 0.5807 -0.5705 -0.2137 -0.1263 0.2720 0.6784 0.3367 0.2812
#> -0.2298 0.4555 -0.4291 0.0732 0.4738 -0.2003 0.2658 0.3731 -0.0465
#> -0.2809 0.1940 -0.0990 0.2505 -0.0127 -0.4197 0.1765 0.8446 -0.0952
#>
#> Columns 10 to 18 0.2151 0.3480 0.2816 0.6208 -0.4694 -0.2574 -0.3221 0.7333 -0.4520
#> -0.2748 -0.2761 -0.7137 0.4835 -0.1636 -0.6345 -0.0590 0.2484 0.0123
#> -0.5653 0.5783 -0.5571 0.2605 0.1348 -0.0049 0.2231 0.4494 0.2747
#>
#> Columns 19 to 20 0.4467 0.2031
#> -0.3049 0.3067
#> -0.0243 0.4457
#>
#> (3,.,.) =
#> Columns 1 to 9 -0.4587 0.5419 -0.3534 -0.4045 -0.4276 -0.2290 0.0918 0.5118 -0.5355
#> -0.1085 0.3107 -0.3817 -0.3413 -0.4206 -0.2209 0.3014 0.1557 -0.1130
#> 0.2597 0.2839 -0.4300 0.1852 0.0349 0.0473 0.3646 -0.3881 0.4646
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.0086 -0.2831 -0.5776 0.0057 -0.0709 0.6271 0.5970 0.2080 0.1378
#> 0.0209 -0.5796 -0.3668 0.0877 -0.3049 0.4987 0.4293 -0.7983 -0.5304
#> 0.2519 -0.3285 0.2158 -0.1921 0.4215 0.2205 0.2465 -0.4121 -0.2367
#>
#> Columns 10 to 18 0.8283 0.0237 0.4405 0.3959 0.7915 -0.3492 -0.5114 0.2015 0.4390
#> 0.4698 -0.1859 0.3392 0.6283 0.7378 0.7038 0.3066 0.7614 -0.0596
#> -0.0976 0.1275 0.5496 0.4558 -0.0012 0.2623 0.1556 -0.0674 -0.4739
#>
#> Columns 19 to 20 0.4836 0.6168
#> -0.4734 0.4156
#> -0.2577 0.4979
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.1114 0.3486 -0.4200 -0.1580 -0.5198 -0.3118 0.2938 0.5197 -0.0018
#> 0.2732 0.4590 -0.4445 0.2851 -0.4892 0.0292 0.2732 0.1335 0.0589
#> 0.3064 0.3742 -0.6699 -0.2006 -0.5265 -0.1959 0.4728 0.1926 -0.0590
#>
#> Columns 10 to 18 0.0210 0.3693 -0.2735 0.4198 -0.6618 -0.2076 -0.0963 0.3921 -0.5134
#> 0.1978 0.4044 -0.1352 0.3989 -0.5551 -0.5138 0.0447 0.2919 -0.1836
#> -0.1193 -0.0102 0.0760 0.4765 -0.0831 -0.3886 0.2522 0.4670 -0.1084
#>
#> Columns 19 to 20 -0.0799 0.3596
#> 0.2372 0.5238
#> 0.2247 0.4753
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>