LSTM seq2seq input and output with different number of time steps

I am new to this field and currently working on a video action prediction project using keras. The input data takes 10% frames of each video and convert all same successive actions into 1 single action. For example [0,0,0,1,1,1,2] -> [0,1,2]. After applying padding and one-hot encoding, the shape of the input data is (1460, 6, 48) -> (number of videos, number of actions, one-hot encoded form for 48 actions). I would like to predict all future actions for each video. The shape of the output should be (1460, 23, 48) -> (number of videos, max timesteps, one-hot encoded form for 48 actions).

Here is my current approach, which does not work.

def lstm_model(frame_len, max_timesteps):

    model = Sequential()
    model.add(LSTM(100, input_shape=(None,48), return_sequences=True))
    model.add(Dense(48, activation='tanh'))
    model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
    return model



I would like to know if I have to keep the number of timesteps the same for input and output. If not, how could I modify the model to fit such data.

Any help would be appreciated.


You can do someting like this :

  1. Encode your input data with LSTM
  2. Copy the required number of time this encoded vector
  3. Decode the encoded vector

In keras, it looks like :

from tensorflow.keras import layers,models


encoder_inputs = layers.Input(shape=(input_timesteps,input_features))

encoder = layers.LSTM(units, return_sequences=False)(encoder_inputs)

decoder = layers.RepeatVector(output_timesteps)(encoder)

decoder = layers.LSTM(units, return_sequences=True)(decoder)

out = layers.TimeDistributed(Dense(output_features))(decoder)

model = models.Model(encoder_inputs, out)

it gives you:

Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 10, 2)]           0         
lstm (LSTM)                  (None, 100)               41200     
repeat_vector (RepeatVector) (None, 3, 100)            0         
lstm_1 (LSTM)                (None, 3, 100)            80400     
time_distributed (TimeDistri (None, 3, 1)              101       

if you want to keep the cell state from the encoder to re use in the decoder, you can do it with return_state=True. Check this question.

topgunner On

While you don't have to keep them the same, you do need to add fully connected layers after LSTM to change dimensions, or use MaxPool2D or similar types of layers.