as we all known, pytorch's LSTM implementation is a layered Bi-directional LSTM.
the first layer's input dimension is supposed to be (L,N,H_in) . If we use bidirectional LSTM, then the output of first layer is (L, N, 2*H_hiddensize) official doc
I can't figure out how this output is fed into the second LSTM layer. will the output of backforward layer and the forward layer be merged or concatenated?
I check the source code of its implementation. source code but i fail to understand it.
layers = [_LSTMLayer(**self.input_size**, self.hidden_size,
self.bias, batch_first=False,
bidirectional=self.bidirectional, **factory_kwargs)]
for layer in range(1, num_layers):
layers.append(_LSTMLayer(**self.hidden_size**, self.hidden_size,
self.bias, batch_first=False,
bidirectional=self.bidirectional,
**factory_kwargs))
for idx, layer in enumerate(self.layers):
x, hxcx[idx] = layer(x, hxcx[idx])
Why the output of first layer (shape: L,N,2H_hiddensize) can be fed into the second layer which expect (shape: L,N, H_hiddensize) but not (shape: L,N,2H_hiddensize)
A bi-directional LSTM can be viewed as 2 independent LSTMs that have nothing to do with each other except that they share the input tensor. The forward LSTM consumes the input in the forward direction whereas the reverse LSTM consumes it in the reverse direction (of the time dimension).