Applies a multi-layer Elman RNN with
Usage
nn_rnn(
input_size,
hidden_size,
num_layers = 1,
nonlinearity = NULL,
bias = TRUE,
batch_first = FALSE,
dropout = 0,
bidirectional = FALSE,
...
)
Arguments
- input_size
The number of expected features in the input
x
The number of features in the hidden state
h
- num_layers
Number of recurrent layers. E.g., setting
num_layers=2
would mean stacking two RNNs together to form astacked RNN
, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1- nonlinearity
The non-linearity to use. Can be either
'tanh'
or'relu'
. Default:'tanh'
- bias
If
FALSE
, then the layer does not use bias weightsb_ih
andb_hh
. Default:TRUE
- batch_first
If
TRUE
, then the input and output tensors are provided as(batch, seq, feature)
. Default:FALSE
- dropout
If non-zero, introduces a
Dropout
layer on the outputs of each RNN layer except the last layer, with dropout probability equal todropout
. Default: 0- bidirectional
If
TRUE
, becomes a bidirectional RNN. Default:FALSE
- ...
other arguments that can be passed to the super class.
Details
For each element in the input sequence, each layer computes the following function:
where t
, t
, and t-1
or the initial hidden state at time 0
.
If nonlinearity
is 'relu'
, then
Inputs
input of shape
(seq_len, batch, input_size)
: tensor containing the features of the input sequence. The input can also be a packed variable length sequence.h_0 of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Outputs
output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor containing the output features (h_t
) from the last layer of the RNN, for eacht
. If a :class:nn_packed_sequence
has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated usingoutput$view(seq_len, batch, num_directions, hidden_size)
, with forward and backward being direction0
and1
respectively. Similarly, the directions can be separated in the packed case.h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor containing the hidden state fort = seq_len
. Like output, the layers can be separated usingh_n$view(num_layers, num_directions, batch, hidden_size)
.
Shape
Input1:
tensor containing input features where andL
represents a sequence length.Input2:
tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. where If the RNN is bidirectional, num_directions should be 2, else it should be 1.Output1:
whereOutput2:
tensor containing the next hidden state for each element in the batch
Attributes
weight_ih_l[k]
: the learnable input-hidden weights of the k-th layer, of shape(hidden_size, input_size)
fork = 0
. Otherwise, the shape is(hidden_size, num_directions * hidden_size)
weight_hh_l[k]
: the learnable hidden-hidden weights of the k-th layer, of shape(hidden_size, hidden_size)
bias_ih_l[k]
: the learnable input-hidden bias of the k-th layer, of shape(hidden_size)
bias_hh_l[k]
: the learnable hidden-hidden bias of the k-th layer, of shape(hidden_size)
Examples
if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.6879 -0.7426 0.6235 -0.3436 -0.4580 -0.1856 -0.4237 -0.3653 -0.4998
#> 0.4247 -0.2906 -0.7126 -0.1000 0.8497 0.1392 -0.8217 -0.9465 0.5423
#> -0.4518 0.0759 0.5993 -0.6274 -0.3062 0.8981 -0.5849 -0.4103 -0.6327
#>
#> Columns 10 to 18 0.1637 0.8972 0.7252 -0.4360 -0.8445 -0.4745 0.6004 -0.9378 -0.9365
#> -0.2678 -0.3445 -0.3011 0.7570 -0.2714 0.3795 0.1486 -0.5424 0.6607
#> 0.6566 0.4242 -0.0694 -0.1700 -0.6498 -0.4550 0.8156 0.6498 0.2456
#>
#> Columns 19 to 20 0.3353 -0.1598
#> -0.6867 -0.0419
#> 0.3595 0.6127
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.3211 -0.4165 -0.3939 -0.4113 0.1727 0.8935 -0.1335 -0.3022 -0.2301
#> -0.0005 -0.3060 0.6773 0.2520 -0.2338 0.0369 -0.6751 -0.7390 -0.5254
#> -0.3416 -0.5301 0.4733 0.5604 0.4544 0.3200 -0.6725 -0.5782 0.7411
#>
#> Columns 10 to 18 -0.0191 0.2358 0.4651 0.3371 -0.0938 0.0556 0.7139 -0.1330 -0.0128
#> -0.1932 -0.3819 0.8335 -0.6837 0.1062 0.1868 -0.3358 -0.4554 -0.0022
#> -0.6284 -0.6811 0.0271 -0.3335 0.1275 0.7442 0.6518 -0.5527 0.3286
#>
#> Columns 19 to 20 -0.8963 -0.0503
#> 0.2398 -0.0630
#> -0.5265 -0.0431
#>
#> (3,.,.) =
#> Columns 1 to 9 -0.1605 -0.3622 -0.1896 0.0736 0.4395 0.4924 -0.4529 -0.6188 -0.3640
#> 0.2473 0.0090 0.2528 0.0808 0.2548 0.3442 -0.3155 -0.5752 -0.6046
#> -0.4337 0.1751 0.3389 -0.3255 0.4379 0.4305 -0.3292 -0.6260 -0.1039
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#>
#> [[2]]
#> torch_tensor
#> (1,.,.) =
#> Columns 1 to 9 -0.4947 0.7403 -0.3566 0.5769 -0.1051 -0.1778 0.7845 0.7701 -0.3923
#> 0.2876 -0.1795 -0.0255 -0.4477 -0.0979 -0.1793 0.6604 0.8124 -0.3598
#> -0.5958 0.3703 -0.7741 0.2157 0.5954 0.0792 -0.4231 0.4005 -0.4394
#>
#> Columns 10 to 18 -0.8687 0.6104 -0.0347 0.6698 -0.6539 0.3492 0.5389 -0.4577 -0.2348
#> -0.2451 0.4365 -0.1462 0.1930 -0.1441 0.0514 0.0577 -0.1735 0.0346
#> 0.0169 -0.0450 0.2458 -0.5125 -0.5027 0.1446 -0.0686 0.5543 -0.0073
#>
#> Columns 19 to 20 0.3049 -0.3823
#> 0.4061 -0.4322
#> -0.6089 -0.2171
#>
#> (2,.,.) =
#> Columns 1 to 9 -0.0666 -0.0614 -0.0416 -0.2682 0.3494 0.5832 0.0781 -0.4075 -0.2602
#> -0.0314 -0.0063 -0.0860 -0.3765 0.3944 0.4857 -0.0903 -0.2619 -0.5838
#> -0.3065 -0.0933 0.1779 -0.1290 0.6005 0.2842 -0.6555 -0.6870 0.5871
#>
#> Columns 10 to 18 0.0521 0.3115 0.2886 -0.1778 -0.2873 -0.2687 0.8145 -0.3960 -0.0372
#> -0.1158 -0.1020 0.5324 -0.2693 -0.1165 0.1864 0.2972 -0.3392 -0.0083
#> -0.3264 0.1265 0.1364 -0.1725 0.2156 0.1005 0.3544 -0.6621 0.4059
#>
#> Columns 19 to 20 -0.7191 -0.2716
#> -0.6820 -0.5899
#> -0.7142 -0.0657
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>