Lack of Variability in Predictions from Multivariate LSTM Model

128 views Asked by At

I've been working on a multivariate LSTM model for time series forecasting, but I'm encountering an issue where the predicted output doesn't exhibit enough variability or 'ups and downs'. The predictions tend to be too smooth or flat, particularly after the first predicted point. Here's a brief overview of my model architecture:

model.add(tf.keras.layers.InputLayer(input_shape=(60,25)))
model.add(LSTM(256,return_sequences=True))
model.add(tf.keras.layers.LayerNormalization())
model.add(Dropout(0.2))
model.add(LSTM(256,return_sequences=True))
model.add(tf.keras.layers.LayerNormalization())
model.add(Dropout(0.2))
model.add(LSTM(256,return_sequences=False))
model.add(tf.keras.layers.LayerNormalization())
model.add(Dropout(0.2))
model.add(Dense(30 * 3,activation=tf.keras.layers.LeakyReLU(alpha=0.1))
model.add(Reshape([30, 3]))

What I am trying to achieve is to have the output layer predict 90 points, which will be further reshaped to three variables. Regarding the data:

  • 670 000 rows, 25 features
  • using 60 past points - prediction of 30 points into the future for 3 target features
  • dataset is first split into training, validation and test (70:20:10) using sliding window
  • each sliding window is one row shifted - shift 1
  • using StandardScaler for scaling as I have couple anomalies in the data I would like to detect afterwards on the results
  • The graph of predicted vs. true values can be seen for one feature below:

enter image description here

My question is: Did I correctly handle the last layers from the LSTM? My goal is for having one-shot prediction of 30 values for 3 features.

What I tried:

  • Hyperparameter tuning containing
    • Number of layers
    • Optimizer (RMSprod, Adam, SGD)
    • Number of units in LSTM/GRU
    • Dropout Rate Chance
    • Learning rate (0.01, 0.001, 0.005)
    • Batch size of sliding windows (64, 128, 256)
  • Warmup for first 4 layers
  • Using other scaling methods (Quantile, MinMaxScaler, PowerTransformer)

Link to documentation of LSTM https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM

2

There are 2 answers

0
mrk On

Output Activation Function: First the most striking one to me (also already mentioned in the comments @DrSnoopy) - Ensure that the output activation function is appropriate for your regression task. Since you're predicting continuous values, using linear activation is often a good choice.

model.add(Dense(30 * 3, activation='linear'))

Loss Function: Confirm that your loss function is suitable for regression. Mean Squared Error (MSE) is commonly used for regression tasks. You did not mention what you are using, but this really is an important choice.

model.compile(loss='mean_squared_error', optimizer=optimizer)

Normalization/Scaling: You said you are trying to do some normalization. a) Make sure your scaling is applied consistently during both training and inference. b) The standardization process should be based on statistics computed from the training set only. This is definitely something you should always do as it is easy to do, and helps your model to converge faster and more stable.

Model Complexity: Consider whether your model is complex enough to capture the underlying patterns in your data. If the predictions are too smooth, it could be an indication that the model is not able to capture the variability in your data. #AddingLayers.

Or..

Sequence Length: Experiment with the length of the input sequence (60 in your case). You could try decreasing it to see if it has an impact on the model's ability to capture patterns (while keeping your capacity / number of units constant).

1
Ritik Kumar On

It looks like you've put a good deal of effort into optimizing your multivariate LSTM model for time series forecasting. The issue you're facing with the predicted output being too smooth or flat is a common challenge in time series forecasting. Here are a few suggestions you might consider:

Adjust the Model Complexity:

Try increasing the complexity of your model. You can experiment with adding more LSTM layers or increasing the number of units in each layer. A more complex model may capture more intricate patterns in the data. Stacked LSTM Layers:

Consider stacking more LSTM layers with the return_sequences=True option. This allows the subsequent layers to receive the full sequence of outputs from the previous layers. It might help the model capture more detailed temporal dependencies. Learning Rate:

Revisit your learning rate. While you've experimented with different optimizers, try adjusting the learning rate for the chosen optimizer. A smaller learning rate might lead to more fine-tuning. Training Duration:

Check if your model is trained long enough. Sometimes, increasing the number of epochs can lead to better results. However, be cautious about overfitting. Loss Function:

Experiment with different loss functions. You might want to consider using a loss function that penalizes larger errors more, especially if you're interested in capturing the 'ups and downs' accurately. Feature Engineering:

Explore if there are additional features or transformations you can apply to your input data that might help the model capture more nuanced patterns. Ensemble Models:

Consider using ensemble techniques, such as combining the predictions of multiple models. This can often improve robustness and capture different aspects of the data. Sequence Length:

Try different sequence lengths. The length of the input sequence might affect the model's ability to capture patterns in the data. Remember to monitor the model's performance on both the training and validation sets to avoid overfitting. Experiment with these suggestions and observe how the changes impact the predicted outputs. It's often a process of iterative experimentation to find the optimal architecture and hyperparameters for your specific forecasting task.