Skip to contents

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

nonlinearity

The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional RNN. Default: FALSE

...

other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

ht=tanh(Wihxt+bih+Whhh(t1)+bhh)

where ht is the hidden state at time t, xt is the input at time t, and h(t1) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then ReLU is used instead of tanh.

Inputs

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

  • Input1: (L,N,Hin) tensor containing input features where Hin=input\_size and L represents a sequence length.

  • Input2: (S,N,Hout) tensor containing the initial hidden state for each element in the batch. Hout=hidden\_size Defaults to zero if not provided. where S=num\_layersnum\_directions If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • Output1: (L,N,Hall) where Hall=num\_directionshidden\_size

  • Output2: (S,N,Hout) tensor containing the next hidden state for each element in the batch

Attributes

  • weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from U(k,k) where k=1hidden\_size

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9  0.5737  0.8175  0.7715 -0.4437 -0.0536  0.3481 -0.1660  0.5390  0.6151
#>   0.0435 -0.0704 -0.0372 -0.3621  0.9216  0.2836  0.6439  0.3410  0.1812
#>   0.7230 -0.6420 -0.7753 -0.3005 -0.6238  0.6109 -0.8732 -0.9056  0.7666
#> 
#> Columns 10 to 18  0.4223 -0.2744 -0.6122  0.5143 -0.6387  0.6037 -0.6352  0.5786 -0.3871
#>   0.2441 -0.6054 -0.0027 -0.3015 -0.8889 -0.2609 -0.0350  0.1835 -0.4018
#>  -0.8197 -0.4651  0.8905 -0.4860 -0.6609  0.7780  0.8382  0.5023  0.0108
#> 
#> Columns 19 to 20  0.6311 -0.4435
#>   0.7361  0.4022
#>   0.7677  0.2430
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.1284 -0.5730  0.5334 -0.1301 -0.2662  0.6422 -0.1795 -0.1192  0.1208
#>  -0.4553 -0.0139  0.4063 -0.4260  0.3107  0.5496 -0.2068  0.2386 -0.3083
#>   0.3888 -0.7720 -0.0870  0.2093 -0.3330 -0.1071  0.5038 -0.3719  0.0638
#> 
#> Columns 10 to 18 -0.1528 -0.2028  0.3387 -0.0518 -0.6899  0.1681 -0.5209 -0.0571  0.5128
#>  -0.3463  0.2185 -0.0698  0.7073 -0.5739  0.3294 -0.4612  0.3402 -0.1932
#>  -0.6251 -0.1127  0.2339 -0.1328 -0.1214  0.6285  0.0687 -0.2767 -0.3338
#> 
#> Columns 19 to 20  0.0089 -0.6351
#>  -0.3738 -0.3503
#>   0.6556  0.5040
#> 
#> (3,.,.) = 
#>  Columns 1 to 9 -0.2356 -0.6113  0.1987  0.1784 -0.0867  0.1171  0.6392  0.0872 -0.4478
#>   0.0514 -0.6339  0.5338 -0.2808 -0.4305  0.5131  0.2434  0.1849 -0.1520
#>   0.3242  0.1775 -0.4671  0.0266  0.0345  0.5796 -0.2894 -0.0200 -0.0918
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.0175 -0.3369 -0.3938 -0.8381  0.5141 -0.7798 -0.2962  0.3420 -0.0011
#>   0.5082  0.2114  0.0509 -0.1013 -0.3356  0.2145 -0.3867  0.4786 -0.2898
#>   0.2854  0.0152  0.0387 -0.0388 -0.1364  0.3562 -0.3331  0.1207 -0.3763
#> 
#> Columns 10 to 18 -0.1939 -0.0752 -0.1715 -0.5644 -0.3147  0.2654  0.1487  0.0050 -0.8708
#>  -0.1933  0.0206 -0.3729 -0.3629  0.0315 -0.4979 -0.0978  0.3018 -0.0699
#>  -0.1006 -0.0743 -0.1578 -0.5160 -0.1038  0.3167  0.1170  0.4318 -0.4652
#> 
#> Columns 19 to 20 -0.4842  0.6353
#>   0.2639 -0.3188
#>  -0.1214  0.0327
#> 
#> (2,.,.) = 
#>  Columns 1 to 9  0.2871 -0.0183  0.0965  0.0217 -0.2058 -0.2055  0.6852 -0.1807 -0.1259
#>   0.0607 -0.3838 -0.2404  0.1043 -0.2131  0.4428  0.1321 -0.2636 -0.3558
#>  -0.0368 -0.2326 -0.3406  0.0439 -0.3058 -0.2124  0.3566 -0.5724 -0.2881
#> 
#> Columns 10 to 18 -0.3794 -0.1391  0.0553  0.1581 -0.1880  0.3958 -0.3701  0.4224 -0.4409
#>  -0.4106  0.4103  0.0546  0.4100 -0.3373  0.4400  0.4279  0.3742  0.1171
#>  -0.6517  0.0637  0.1748  0.4747 -0.4689  0.3638 -0.0907 -0.1443  0.0389
#> 
#> Columns 19 to 20  0.1622 -0.1112
#>  -0.0212 -0.1135
#>   0.3322  0.2525
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>