I am new to this field and currently working on a video action prediction project using keras. The input data takes 10% frames of each video and convert all same successive actions into 1 single action. For example [0,0,0,1,1,1,2] -> [0,1,2]. After applying padding and one-hot encoding, the shape of the input data is (1460, 6, 48) -> (number of videos, number of actions, one-hot encoded form for 48 actions). I would like to predict all future actions for each video. The shape of the output should be (1460, 23, 48) -> (number of videos, max timesteps, one-hot encoded form for 48 actions).
Here is my current approach, which does not work.
def lstm_model(frame_len, max_timesteps):
model = Sequential()
model.add(LSTM(100, input_shape=(None,48), return_sequences=True))
model.add(Dense(48, activation='tanh'))
model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
model.summary()
return model
I would like to know if I have to keep the number of timesteps the same for input and output. If not, how could I modify the model to fit such data.
Any help would be appreciated.
You can do someting like this :
In keras, it looks like :
it gives you:
if you want to keep the cell state from the encoder to re use in the decoder, you can do it with
return_state=True
. Check this question.