Skip to contents

Applies a multi-layer Elman RNN with tanh or ReLU non-linearity to an input sequence.

Usage

nn_rnn(
  input_size,
  hidden_size,
  num_layers = 1,
  nonlinearity = NULL,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

nonlinearity

The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional RNN. Default: FALSE

...

other arguments that can be passed to the super class.

Details

For each element in the input sequence, each layer computes the following function:

ht=tanh(Wihxt+bih+Whhh(t1)+bhh)

where ht is the hidden state at time t, xt is the input at time t, and h(t1) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then ReLU is used instead of tanh.

Inputs

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a :class:nn_packed_sequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(num_layers, num_directions, batch, hidden_size).

Shape

  • Input1: (L,N,Hin) tensor containing input features where Hin=input\_size and L represents a sequence length.

  • Input2: (S,N,Hout) tensor containing the initial hidden state for each element in the batch. Hout=hidden\_size Defaults to zero if not provided. where S=num\_layersnum\_directions If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • Output1: (L,N,Hall) where Hall=num\_directionshidden\_size

  • Output2: (S,N,Hout) tensor containing the next hidden state for each element in the batch

Attributes

  • weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size)

  • weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size)

  • bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape (hidden_size)

  • bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size)

Note

All the weights and biases are initialized from U(k,k) where k=1hidden\_size

Examples

if (torch_is_installed()) {
rnn <- nn_rnn(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
rnn(input, h0)
}
#> [[1]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.6879 -0.7426  0.6235 -0.3436 -0.4580 -0.1856 -0.4237 -0.3653 -0.4998
#>   0.4247 -0.2906 -0.7126 -0.1000  0.8497  0.1392 -0.8217 -0.9465  0.5423
#>  -0.4518  0.0759  0.5993 -0.6274 -0.3062  0.8981 -0.5849 -0.4103 -0.6327
#> 
#> Columns 10 to 18  0.1637  0.8972  0.7252 -0.4360 -0.8445 -0.4745  0.6004 -0.9378 -0.9365
#>  -0.2678 -0.3445 -0.3011  0.7570 -0.2714  0.3795  0.1486 -0.5424  0.6607
#>   0.6566  0.4242 -0.0694 -0.1700 -0.6498 -0.4550  0.8156  0.6498  0.2456
#> 
#> Columns 19 to 20  0.3353 -0.1598
#>  -0.6867 -0.0419
#>   0.3595  0.6127
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.3211 -0.4165 -0.3939 -0.4113  0.1727  0.8935 -0.1335 -0.3022 -0.2301
#>  -0.0005 -0.3060  0.6773  0.2520 -0.2338  0.0369 -0.6751 -0.7390 -0.5254
#>  -0.3416 -0.5301  0.4733  0.5604  0.4544  0.3200 -0.6725 -0.5782  0.7411
#> 
#> Columns 10 to 18 -0.0191  0.2358  0.4651  0.3371 -0.0938  0.0556  0.7139 -0.1330 -0.0128
#>  -0.1932 -0.3819  0.8335 -0.6837  0.1062  0.1868 -0.3358 -0.4554 -0.0022
#>  -0.6284 -0.6811  0.0271 -0.3335  0.1275  0.7442  0.6518 -0.5527  0.3286
#> 
#> Columns 19 to 20 -0.8963 -0.0503
#>   0.2398 -0.0630
#>  -0.5265 -0.0431
#> 
#> (3,.,.) = 
#>  Columns 1 to 9 -0.1605 -0.3622 -0.1896  0.0736  0.4395  0.4924 -0.4529 -0.6188 -0.3640
#>   0.2473  0.0090  0.2528  0.0808  0.2548  0.3442 -0.3155 -0.5752 -0.6046
#>  -0.4337  0.1751  0.3389 -0.3255  0.4379  0.4305 -0.3292 -0.6260 -0.1039
#> ... [the output was truncated (use n=-1 to disable)]
#> [ CPUFloatType{5,3,20} ][ grad_fn = <StackBackward0> ]
#> 
#> [[2]]
#> torch_tensor
#> (1,.,.) = 
#>  Columns 1 to 9 -0.4947  0.7403 -0.3566  0.5769 -0.1051 -0.1778  0.7845  0.7701 -0.3923
#>   0.2876 -0.1795 -0.0255 -0.4477 -0.0979 -0.1793  0.6604  0.8124 -0.3598
#>  -0.5958  0.3703 -0.7741  0.2157  0.5954  0.0792 -0.4231  0.4005 -0.4394
#> 
#> Columns 10 to 18 -0.8687  0.6104 -0.0347  0.6698 -0.6539  0.3492  0.5389 -0.4577 -0.2348
#>  -0.2451  0.4365 -0.1462  0.1930 -0.1441  0.0514  0.0577 -0.1735  0.0346
#>   0.0169 -0.0450  0.2458 -0.5125 -0.5027  0.1446 -0.0686  0.5543 -0.0073
#> 
#> Columns 19 to 20  0.3049 -0.3823
#>   0.4061 -0.4322
#>  -0.6089 -0.2171
#> 
#> (2,.,.) = 
#>  Columns 1 to 9 -0.0666 -0.0614 -0.0416 -0.2682  0.3494  0.5832  0.0781 -0.4075 -0.2602
#>  -0.0314 -0.0063 -0.0860 -0.3765  0.3944  0.4857 -0.0903 -0.2619 -0.5838
#>  -0.3065 -0.0933  0.1779 -0.1290  0.6005  0.2842 -0.6555 -0.6870  0.5871
#> 
#> Columns 10 to 18  0.0521  0.3115  0.2886 -0.1778 -0.2873 -0.2687  0.8145 -0.3960 -0.0372
#>  -0.1158 -0.1020  0.5324 -0.2693 -0.1165  0.1864  0.2972 -0.3392 -0.0083
#>  -0.3264  0.1265  0.1364 -0.1725  0.2156  0.1005  0.3544 -0.6621  0.4059
#> 
#> Columns 19 to 20 -0.7191 -0.2716
#>  -0.6820 -0.5899
#>  -0.7142 -0.0657
#> [ CPUFloatType{2,3,20} ][ grad_fn = <StackBackward0> ]
#>