How the function nn.LSTM behaves within the batches/ seq_len?

548 views Asked by At

I’m currently learning to use nn.LSTM with pytorch and had to ask how the function is working.

Basically I’m trying to feed my dataset matrix (M x N). Since the dataset is a matrix, I wanted to feed the dataset recursively(as timesteps) into the LSTM network with Dataloader(utils.data.Dataset).

The point where i got confused was the size of input(seq_len, batch, input_size)

Let’s say I’m getting my data_loader with batch_size=10. In order to generate the train_loader with the right form, I had to make the previous size of (M x N) into the size including the sequence_length which could simply be transformed to (M/seq_len, seq_len, N).

Then the input size of my nn.LSTM would be like: (M/seq_len/batch_size, seq_len, N)

So, my main question comes:

  1. If i feed this data size into the LSTM model nn.LSTM(N, hidden_size), is the LSTM model already doing the recursive feed-forward within the whole batch?

  2. I'm also confused with the seq_len, while seq_len>1, the output will get the dimension of seq_len. Would that mean the output contains the recursive operations of sequences?

I’m not sure i made the questions clear, but my understanding is getting quite messed up..lol Hope somebody could help me organizing the right understanding.

1

There are 1 answers

11
Szymon Maszke On BEST ANSWER
  1. Yes, provided each sample's sequence length is the same (which seems to be the case here). If not, you have to pad with torch.nn.utils.rnn.pad_sequence for example.

  2. Yes, LSTM is expanded to each timestep and there is output for each timestep already. Hence you don't have to apply it for each element separately.