Skip to contents

For each element in the input sequence, each layer computes the following function:

Usage

nn_lstm(
  input_size,
  hidden_size,
  num_layers = 1,
  bias = TRUE,
  batch_first = FALSE,
  dropout = 0,
  bidirectional = FALSE,
  ...
)

Arguments

input_size

The number of expected features in the input x

hidden_size

The number of features in the hidden state h

num_layers

Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1

bias

If FALSE, then the layer does not use bias weights b_ih and b_hh. Default: TRUE

batch_first

If TRUE, then the input and output tensors are provided as (batch, seq, feature). Default: FALSE

dropout

If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Default: 0

bidirectional

If TRUE, becomes a bidirectional LSTM. Default: FALSE

...

currently unused.

Details

it=σ(Wiixt+bii+Whih(t1)+bhi)ft=σ(Wifxt+bif+Whfh(t1)+bhf)gt=tanh(Wigxt+big+Whgh(t1)+bhg)ot=σ(Wioxt+bio+Whoh(t1)+bho)ct=ftc(t1)+itgtht=ottanh(ct)

where ht is the hidden state at time t, ct is the cell state at time t, xt is the input at time t, h(t1) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0, and it, ft, gt, ot are the input, forget, cell, and output gates, respectively. σ is the sigmoid function.

Note

All the weights and biases are initialized from U(k,k) where k=1hidden\_size

Inputs

Inputs: input, (h_0, c_0)

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See nn_utils_rnn_pack_padded_sequence() or nn_utils_rnn_pack_sequence() for details.

  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.

  • c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch.

If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.

Outputs

Outputs: output, (h_n, c_n)

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a torch_nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output$view(c(seq_len, batch, num_directions, hidden_size)), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. Like output, the layers can be separated using h_n$view(c(num_layers, num_directions, batch, hidden_size)) and similarly for c_n.

  • c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len

Attributes

  • weight_ih_l[k] : the learnable input-hidden weights of the kth layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size x input_size)

  • weight_hh_l[k] : the learnable hidden-hidden weights of the kth layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size x hidden_size)

  • bias_ih_l[k] : the learnable input-hidden bias of the kth layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size)

  • bias_hh_l[k] : the learnable hidden-hidden bias of the kth layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size)

Examples

if (torch_is_installed()) {
rnn <- nn_lstm(10, 20, 2)
input <- torch_randn(5, 3, 10)
h0 <- torch_randn(2, 3, 20)
c0 <- torch_randn(2, 3, 20)
output <- rnn(input, list(h0, c0))
}