Applies a multi-layer Elman RNN with
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
where t
, t
, and t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1:
tensor containing input features where andL
represents a sequence length.Input2:
tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. where If the RNN is bidirectional, num_directions should be 2, else it should be 1.Output1:
whereOutput2:
tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.5737 0.8175 0.7715 -0.4437 -0.0536 0.3481 -0.1660 0.5390 0.6151
#> 0.0435 -0.0704 -0.0372 -0.3621 0.9216 0.2836 0.6439 0.3410 0.1812
#> 0.7230 -0.6420 -0.7753 -0.3005 -0.6238 0.6109 -0.8732 -0.9056 0.7666
#>
#> Columns 10 to 18 0.4223 -0.2744 -0.6122 0.5143 -0.6387 0.6037 -0.6352 0.5786 -0.3871
#> 0.2441 -0.6054 -0.0027 -0.3015 -0.8889 -0.2609 -0.0350 0.1835 -0.4018
#> -0.8197 -0.4651 0.8905 -0.4860 -0.6609 0.7780 0.8382 0.5023 0.0108
#>
#> Columns 19 to 20 0.6311 -0.4435
#> 0.7361 0.4022
#> 0.7677 0.2430
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.1284 -0.5730 0.5334 -0.1301 -0.2662 0.6422 -0.1795 -0.1192 0.1208
#> -0.4553 -0.0139 0.4063 -0.4260 0.3107 0.5496 -0.2068 0.2386 -0.3083
#> 0.3888 -0.7720 -0.0870 0.2093 -0.3330 -0.1071 0.5038 -0.3719 0.0638
#>
#> Columns 10 to 18 -0.1528 -0.2028 0.3387 -0.0518 -0.6899 0.1681 -0.5209 -0.0571 0.5128
#> -0.3463 0.2185 -0.0698 0.7073 -0.5739 0.3294 -0.4612 0.3402 -0.1932
#> -0.6251 -0.1127 0.2339 -0.1328 -0.1214 0.6285 0.0687 -0.2767 -0.3338
#>
#> Columns 19 to 20 0.0089 -0.6351
#> -0.3738 -0.3503
#> 0.6556 0.5040
#>
#> (3,.,.) =
#> Columns 1 to 9 -0.2356 -0.6113 0.1987 0.1784 -0.0867 0.1171 0.6392 0.0872 -0.4478
#> 0.0514 -0.6339 0.5338 -0.2808 -0.4305 0.5131 0.2434 0.1849 -0.1520
#> 0.3242 0.1775 -0.4671 0.0266 0.0345 0.5796 -0.2894 -0.0200 -0.0918
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.0175 -0.3369 -0.3938 -0.8381 0.5141 -0.7798 -0.2962 0.3420 -0.0011
#> 0.5082 0.2114 0.0509 -0.1013 -0.3356 0.2145 -0.3867 0.4786 -0.2898
#> 0.2854 0.0152 0.0387 -0.0388 -0.1364 0.3562 -0.3331 0.1207 -0.3763
#>
#> Columns 10 to 18 -0.1939 -0.0752 -0.1715 -0.5644 -0.3147 0.2654 0.1487 0.0050 -0.8708
#> -0.1933 0.0206 -0.3729 -0.3629 0.0315 -0.4979 -0.0978 0.3018 -0.0699
#> -0.1006 -0.0743 -0.1578 -0.5160 -0.1038 0.3167 0.1170 0.4318 -0.4652
#>
#> Columns 19 to 20 -0.4842 0.6353
#> 0.2639 -0.3188
#> -0.1214 0.0327
#>
#> (2,.,.) =
#> Columns 1 to 9 0.2871 -0.0183 0.0965 0.0217 -0.2058 -0.2055 0.6852 -0.1807 -0.1259
#> 0.0607 -0.3838 -0.2404 0.1043 -0.2131 0.4428 0.1321 -0.2636 -0.3558
#> -0.0368 -0.2326 -0.3406 0.0439 -0.3058 -0.2124 0.3566 -0.5724 -0.2881
#>
#> Columns 10 to 18 -0.3794 -0.1391 0.0553 0.1581 -0.1880 0.3958 -0.3701 0.4224 -0.4409
#> -0.4106 0.4103 0.0546 0.4100 -0.3373 0.4400 0.4279 0.3742 0.1171
#> -0.6517 0.0637 0.1748 0.4747 -0.4689 0.3638 -0.0907 -0.1443 0.0389
#>
#> Columns 19 to 20 0.1622 -0.1112
#> -0.0212 -0.1135
#> 0.3322 0.2525
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>