confusion about pytorch LSTM implementation

460 views Asked by At

as we all known, pytorch's LSTM implementation is a layered Bi-directional LSTM.

the first layer's input dimension is supposed to be (L,N,H_in) . If we use bidirectional LSTM, then the output of first layer is (L, N, 2*H_hiddensize) official doc

I can't figure out how this output is fed into the second LSTM layer. will the output of backforward layer and the forward layer be merged or concatenated?

I check the source code of its implementation. source code but i fail to understand it.

layers = [_LSTMLayer(**self.input_size**, self.hidden_size,
                             self.bias, batch_first=False,
                             bidirectional=self.bidirectional, **factory_kwargs)]

for layer in range(1, num_layers):
    layers.append(_LSTMLayer(**self.hidden_size**, self.hidden_size,
                                     self.bias, batch_first=False,
                                     bidirectional=self.bidirectional,
                                     **factory_kwargs))
for idx, layer in enumerate(self.layers):
    x, hxcx[idx] = layer(x, hxcx[idx])

Why the output of first layer (shape: L,N,2H_hiddensize) can be fed into the second layer which expect (shape: L,N, H_hiddensize) but not (shape: L,N,2H_hiddensize)

2

There are 2 answers

0
dhruvbird On

A bi-directional LSTM can be viewed as 2 independent LSTMs that have nothing to do with each other except that they share the input tensor. The forward LSTM consumes the input in the forward direction whereas the reverse LSTM consumes it in the reverse direction (of the time dimension).

0
Yong On

I can't figure out how this output is fed into the second LSTM layer. will the output of backforward layer and the forward layer be merged or concatenated?

Yes, the output of bidirectional LSTM will concatenate the last step of forward hidden and the first step of reverse hidden

reference: Pytorch LSTM documentation

For bidirectional LSTMs, h_n is not equivalent to the last element of output; the former contains the final forward and reverse hidden states, while the latter contains the final forward hidden state and the initial reverse hidden state.