LSTM seq2seq input and output with different number of time steps

1.1k views Asked by At

I am new to this field and currently working on a video action prediction project using keras. The input data takes 10% frames of each video and convert all same successive actions into 1 single action. For example [0,0,0,1,1,1,2] -> [0,1,2]. After applying padding and one-hot encoding, the shape of the input data is (1460, 6, 48) -> (number of videos, number of actions, one-hot encoded form for 48 actions). I would like to predict all future actions for each video. The shape of the output should be (1460, 23, 48) -> (number of videos, max timesteps, one-hot encoded form for 48 actions).

Here is my current approach, which does not work.

def lstm_model(frame_len, max_timesteps):

    model = Sequential()
    model.add(LSTM(100, input_shape=(None,48), return_sequences=True))
    model.add(Dense(48, activation='tanh'))
    model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
    model.summary()
    return model

Image1

Image2

I would like to know if I have to keep the number of timesteps the same for input and output. If not, how could I modify the model to fit such data.

Any help would be appreciated.

2

There are 2 answers

2
B Douchet On BEST ANSWER

You can do someting like this :

  1. Encode your input data with LSTM
  2. Copy the required number of time this encoded vector
  3. Decode the encoded vector

In keras, it looks like :

from tensorflow.keras import layers,models

input_timesteps=10
input_features=2
output_timesteps=3
output_features=1
units=100

#Input
encoder_inputs = layers.Input(shape=(input_timesteps,input_features))

#Encoder
encoder = layers.LSTM(units, return_sequences=False)(encoder_inputs)

#Repeat
decoder = layers.RepeatVector(output_timesteps)(encoder)

#Decoder
decoder = layers.LSTM(units, return_sequences=True)(decoder)

#Output
out = layers.TimeDistributed(Dense(output_features))(decoder)

model = models.Model(encoder_inputs, out)

it gives you:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 10, 2)]           0         
_________________________________________________________________
lstm (LSTM)                  (None, 100)               41200     
_________________________________________________________________
repeat_vector (RepeatVector) (None, 3, 100)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 3, 100)            80400     
_________________________________________________________________
time_distributed (TimeDistri (None, 3, 1)              101       
=================================================================

if you want to keep the cell state from the encoder to re use in the decoder, you can do it with return_state=True. Check this question.

0
topgunner On

While you don't have to keep them the same, you do need to add fully connected layers after LSTM to change dimensions, or use MaxPool2D or similar types of layers.