I am doing a time-series sequence classification problem.
I have 80 time-series all length 1002. Each seq corresponds to 1 of 4 categories (copper, cadmium, lead, mercury). I want to use Keras LSTMs to model this. These models require data to be fed in the form [batches, timesteps, features]
. As each seq is independent, the most basic setup is for X_train
to have shape [80, 1002, 1]
. This works fine in an LSTM (with stateful=False
)
But, 1002 is quite a long seq length. A smaller size could perform better.
Let's say I split each seq up into 3 parts of 334. I could continue to use a stateless LSTM. But (I think?) it makes sense to have it be stateful for 3 samples and then reset state (since the 3 chunks are related).
How do I implement this in Keras?
First, I transform the data into shape [240, 334, 1]
using a simple X_train.reshape(-1, 334, 1)
but how do I maintain the state for 3 samples and then reset the state in model.fit()
?
I know I need to call model.reset_states()
somewhere but couldn't find any sample code out there showing me how to work it. Do I have to subclass a model? Can I do this using for epoch in range(num_epochs)
and GradientTape
? What are my options? How can I implement this?
Also, if I split the sequences up, what do I do with the labels? Do I multiply them by the number of chunks each seq is split up into (3 in this case)? Is there a way for an LSTM to ingest 3 samples and then spit out one prediction? Or does each sample have to correspond to a prediction?
Finally, if I split my sequences up into 3 subsequences, do I have to have a batch size of 3? Or can I choose any multiple of 3?
Here is the super basic code I used with X_train.shape == [80, 1002, 1]
.
model = Sequential([
LSTM(10, batch_input_shape=(10, 1002, 1)), # 10 samples per batch
Dense(4, activation='sigmoid')
])
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=3, batch_size=10, shuffle=False)
I know there are loads of questions here, happy to make separate ones if this is too much for one.
The easy solution is to reshape the data from having 1 feature to having 3.
Turn
[80, 1002, 1]
into[80, 334, 3]
rather than[240, 334, 1]
. This keeps the number of samples the same and so you don't have to mess around with statefulness. You can also just use it with the normalfit()
API.