Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t
, \(x_t\) is
the input at time t
, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
L
represents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.7760 0.7453 -0.8135 -0.1260 -0.6849 -0.5407 -0.1798 -0.0782 0.3087
#> -0.3387 -0.4634 0.5041 0.6188 0.7599 -0.4540 -0.9372 0.2091 -0.5749
#> 0.1445 0.8924 -0.4224 0.1523 0.2014 0.1200 -0.3852 0.0442 0.6321
#>
#> Columns 10 to 18 -0.4789 0.8594 -0.0204 -0.6847 -0.5105 0.6508 -0.3157 -0.7205 -0.6654
#> 0.9237 0.5144 -0.8931 -0.5068 0.3923 -0.2059 -0.3604 0.8662 -0.0582
#> 0.9189 -0.0485 -0.1994 -0.2125 -0.4593 0.4590 0.8533 0.8538 0.2311
#>
#> Columns 19 to 20 0.8080 -0.2104
#> 0.3382 -0.4767
#> 0.2500 -0.3892
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.2815 0.5201 -0.6495 0.0351 -0.1513 -0.2072 -0.2565 -0.3849 -0.1523
#> 0.0071 0.7487 -0.0252 -0.1314 -0.5933 -0.0914 -0.2361 -0.2672 0.3404
#> -0.0799 0.6846 -0.0745 0.5310 0.2673 -0.4345 0.0682 0.2624 0.2324
#>
#> Columns 10 to 18 -0.7287 0.2205 -0.6220 0.4348 -0.4572 -0.1141 -0.0819 -0.0709 0.6546
#> 0.2467 0.7273 0.2330 0.0524 -0.0720 -0.1960 -0.1301 -0.4423 -0.6132
#> 0.4452 0.6180 -0.4571 0.0999 0.0538 0.2452 0.1344 -0.4337 0.1214
#>
#> Columns 19 to 20 0.0708 -0.5350
#> -0.0706 0.1784
#> 0.6142 -0.4183
#>
#> (3,.,.) =
#> Columns 1 to 9 -0.4763 0.5340 0.1026 0.4987 -0.1831 -0.2551 -0.0806 -0.4321 -0.3496
#> -0.2620 0.2877 -0.2450 -0.0023 0.3150 -0.1754 -0.2119 0.1692 -0.0932
#> -0.1292 0.6732 -0.2190 -0.0891 0.0655 0.1022 -0.3270 0.0892 0.0378
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.4751 -0.8483 0.0879 0.6429 -0.5682 0.6717 -0.4121 -0.2049 0.6930
#> -0.7307 -0.6716 -0.1829 0.5269 -0.7630 0.3690 -0.6108 0.0137 0.8703
#> -0.8678 0.0393 0.3725 -0.5869 0.2251 -0.1100 0.0490 0.7668 0.1678
#>
#> Columns 10 to 18 0.6443 -0.7770 -0.4368 0.3099 -0.6685 -0.5122 0.2409 -0.7944 0.0299
#> -0.0300 -0.6571 0.1197 -0.4809 -0.4327 -0.1425 0.7034 -0.6868 -0.6755
#> -0.6285 0.0818 0.4730 -0.1781 0.2157 0.4803 0.0702 0.2968 -0.2557
#>
#> Columns 19 to 20 0.0032 -0.1562
#> -0.1597 0.1578
#> 0.1684 -0.4400
#>
#> (2,.,.) =
#> Columns 1 to 9 0.2249 0.6148 0.0257 0.2643 0.0557 0.4933 -0.0232 -0.2164 -0.1008
#> 0.2075 0.4470 -0.1378 0.3002 0.5191 0.1169 -0.0708 -0.0279 0.0517
#> -0.3448 -0.0203 -0.3674 0.1453 0.3233 -0.5030 -0.1293 0.1307 -0.1935
#>
#> Columns 10 to 18 0.5867 0.6088 -0.8176 0.1219 0.3052 0.4675 -0.0821 -0.6279 -0.4466
#> 0.6094 0.5894 -0.3966 -0.2328 0.5532 0.4560 0.4582 -0.5609 -0.2418
#> 0.3690 0.0953 -0.0665 0.2931 0.0232 -0.4560 0.1377 0.2364 0.6089
#>
#> Columns 19 to 20 0.5521 0.0811
#> -0.0332 -0.5511
#> -0.0849 -0.7609
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>