Applies a multi-layer Elman RNN with
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
where t
, t
, and t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1:
tensor containing input features where andL
represents a sequence length.Input2:
tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. where If the RNN is bidirectional, num_directions should be 2, else it should be 1.Output1:
whereOutput2:
tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 0.1138 0.6631 -0.1918 0.6281 0.2510 0.5953 -0.9136 -0.0814 -0.2429
#> -0.1266 0.7703 0.2245 -0.4914 0.2877 -0.6799 0.8433 0.2830 0.2680
#> -0.0520 0.8430 -0.0752 0.3551 -0.2969 -0.2672 -0.6215 -0.7958 -0.1652
#>
#> Columns 10 to 18 -0.3164 0.0610 -0.0020 -0.3274 0.4744 -0.2470 -0.2928 0.7699 -0.3805
#> -0.4374 0.6248 0.1377 0.3988 0.7886 -0.1949 -0.5301 -0.0371 -0.8580
#> 0.6870 0.8425 0.1067 -0.1855 0.2651 0.5158 0.6275 0.8245 -0.4980
#>
#> Columns 19 to 20 -0.8695 0.5809
#> 0.2636 0.6554
#> -0.1446 0.4440
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.6179 0.2517 0.2674 0.0204 0.2723 -0.3134 -0.1389 -0.0190 0.1570
#> -0.1340 0.2771 0.1483 -0.2596 0.4032 -0.1611 0.1238 -0.1764 -0.0698
#> -0.5917 0.4475 -0.2211 -0.2794 0.0867 0.1524 -0.0418 0.3546 0.1035
#>
#> Columns 10 to 18 -0.0817 0.0739 -0.1066 -0.1188 0.0669 0.3599 0.4410 0.1185 -0.2348
#> 0.6463 0.7312 0.0089 -0.4626 -0.2593 0.0311 0.2613 0.3616 -0.1381
#> 0.1381 -0.4526 -0.7709 0.1242 -0.1917 -0.1203 0.5128 -0.5166 0.3311
#>
#> Columns 19 to 20 -0.5298 0.1405
#> 0.0107 0.2482
#> -0.3337 0.0425
#>
#> (3,.,.) =
#> Columns 1 to 9 -0.0461 -0.0263 0.0212 0.1693 -0.4990 -0.1146 -0.3431 0.4728 -0.1178
#> -0.2473 -0.1363 0.0523 0.1693 0.0459 -0.2329 -0.6231 0.4409 -0.4793
#> -0.2879 0.2178 -0.0925 0.3525 0.2685 0.0030 -0.3820 -0.2428 -0.0663
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.2696 0.0398 0.4615 0.3071 -0.5003 0.2484 -0.3660 0.0948 0.0016
#> -0.3122 -0.2843 -0.2927 0.1812 0.2122 0.5493 0.1288 0.4333 -0.3808
#> -0.9319 0.2868 -0.4655 0.0162 -0.0315 0.0888 -0.2612 -0.1192 -0.0668
#>
#> Columns 10 to 18 -0.3016 -0.0983 0.2063 0.6766 0.1090 0.0025 0.1229 -0.0869 -0.3359
#> -0.3524 -0.0248 0.3073 0.6688 0.3359 0.7473 0.1766 0.3498 -0.5443
#> 0.2575 0.4057 0.6756 -0.3496 -0.1808 -0.6011 -0.2010 -0.4891 -0.5971
#>
#> Columns 19 to 20 0.5769 -0.4665
#> 0.3955 -0.1531
#> -0.0789 -0.3875
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.3803 0.4090 0.4323 0.1755 -0.2874 -0.1086 -0.2093 0.0970 -0.1678
#> -0.2449 0.2651 0.3549 0.1676 0.0407 0.0302 -0.3808 -0.1554 -0.1772
#> 0.4547 0.0682 -0.2861 -0.1158 -0.2140 -0.0062 -0.4872 -0.4629 -0.5972
#>
#> Columns 10 to 18 0.3575 -0.0950 -0.3082 0.0546 -0.3701 -0.3337 0.3244 -0.3924 -0.1071
#> 0.3914 -0.1311 -0.0138 -0.2266 0.3266 0.3640 0.4405 0.1752 -0.1293
#> 0.4631 -0.1551 -0.1552 -0.2861 0.4154 0.5296 0.4948 0.4903 0.3286
#>
#> Columns 19 to 20 -0.0363 0.0573
#> -0.1814 -0.1200
#> -0.6555 0.3351
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>