Applies a multi-layer Elman RNN with \(\tanh\) or \(\mbox{ReLU}\) non-linearity to an input sequence.
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
$$ h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) $$
where \(h_t\) is the hidden state at time t
, \(x_t\) is
the input at time t
, and \(h_{(t-1)}\) is the hidden state of the
previous layer at time t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then \(\mbox{ReLU}\) is used instead of
\(\tanh\).
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\mbox{input\_size}\) and
L
represents a sequence length.Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\mbox{hidden\_size}\) Defaults to zero if not provided. where \(S=\mbox{num\_layers} * \mbox{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: \((L, N, H_{all})\) where \(H_{all}=\mbox{num\_directions} * \mbox{hidden\_size}\)
Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Note
All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\mbox{hidden\_size}}\)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.8040 -0.3092 -0.0555 0.2335 -0.2217 -0.5908 0.2679 0.3845 0.0937
#> -0.2382 0.5811 0.6305 0.0844 -0.4830 -0.2957 0.0956 0.9263 0.4017
#> -0.4306 -0.3961 0.6024 0.4475 0.1841 -0.6021 -0.5018 0.3913 -0.7079
#>
#> Columns 10 to 18 -0.5059 -0.6756 0.8349 0.4919 -0.4339 -0.8934 0.2698 -0.2857 -0.4519
#> -0.8507 -0.0182 0.2623 -0.3637 -0.2642 0.2595 0.0666 0.7768 0.1554
#> -0.9662 -0.0063 -0.6959 -0.4043 -0.0955 0.9586 -0.6057 0.0511 0.6845
#>
#> Columns 19 to 20 0.7976 0.1734
#> 0.6827 -0.1776
#> -0.7289 -0.7782
#>
#> (2,.,.) =
#> Columns 1 to 9 0.3889 -0.3601 -0.0187 -0.2968 -0.0215 0.0275 -0.0976 -0.6676 0.4277
#> -0.8194 0.7075 -0.1063 -0.4142 0.4080 -0.4120 0.0666 0.3513 0.0828
#> -0.3121 0.2380 0.0845 -0.3078 0.1855 -0.1288 0.5471 0.0283 0.2182
#>
#> Columns 10 to 18 -0.4533 -0.4032 0.5500 0.3453 0.1722 0.1118 0.1523 -0.4451 -0.5157
#> -0.3841 0.3041 0.2955 -0.3555 0.3039 0.3432 0.4212 0.2755 -0.1093
#> -0.1794 0.1411 0.4756 -0.0663 0.7400 0.2700 0.3029 0.5518 0.1165
#>
#> Columns 19 to 20 -0.2847 -0.0894
#> 0.0462 -0.1315
#> -0.1700 -0.5845
#>
#> (3,.,.) =
#> Columns 1 to 9 -0.0975 0.1631 -0.0524 -0.2066 -0.5398 0.0839 -0.2147 -0.1502 0.4560
#> -0.2505 0.3777 -0.1579 -0.5296 -0.0623 0.2707 0.3289 0.2627 0.5263
#> -0.7112 0.1882 -0.4830 -0.2793 -0.2617 0.2113 0.1065 0.3081 0.4845
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.5894 -0.8975 -0.1316 -0.1465 -0.3198 0.4174 -0.1543 -0.7180 0.3137
#> 0.1829 -0.3649 -0.2477 -0.2714 -0.3783 -0.3590 -0.3948 -0.0928 0.3611
#> -0.4389 -0.2385 0.5147 -0.6942 -0.1651 0.1956 -0.0757 -0.4346 0.1995
#>
#> Columns 10 to 18 -0.5749 -0.2482 0.7138 0.5591 -0.0564 -0.2561 -0.7119 -0.2263 -0.1713
#> 0.5589 0.4213 0.1974 0.3019 -0.4558 0.4529 0.0645 0.3129 0.0558
#> -0.1634 0.0046 -0.1113 -0.4117 -0.2508 0.2835 0.1263 -0.1403 0.2300
#>
#> Columns 19 to 20 -0.0693 -0.4724
#> 0.0856 -0.1832
#> -0.2027 0.3731
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.0995 0.4661 0.0508 -0.1046 -0.6234 0.0702 -0.2974 0.0622 0.1838
#> -0.3789 0.2820 -0.2073 -0.2903 -0.1876 0.2477 0.2295 -0.3757 0.2671
#> -0.0732 0.4498 -0.3874 -0.3978 -0.2299 0.2338 0.1490 -0.2620 0.6278
#>
#> Columns 10 to 18 -0.4097 -0.0530 0.4075 -0.3532 -0.2414 -0.1718 0.1625 0.0727 -0.1397
#> -0.0966 -0.0552 0.3717 -0.0748 0.3183 0.0857 0.1956 -0.0692 -0.4201
#> -0.0819 -0.3396 0.5400 0.0478 0.1245 -0.1068 0.0449 0.1172 -0.4559
#>
#> Columns 19 to 20 0.0983 0.2485
#> 0.0664 -0.0674
#> 0.3425 0.2600
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>