I am trying to predict wave heights using an LSTM Keras in Python 3.9. Just for the ease of my example here, I only used two features: significant wave height and H1/3 (wave height is offcourse dependent on multiple other factors). For some reason which I can't recall, I get an error that the y_pred and y_true are not matching in shape:
ValueError: Dimensions must be equal, but are 126 and 24 for '{{node mean_absolute_error/sub}} = Sub[T=DT_FLOAT](sequential/dense/BiasAdd, IteratorGetNext:1)' with input shapes: [?,126,2], [?,24,2].
How I set up my model: I use data of multiple buoys of which one buoy is my output and the rest is my input (3 buoys). The dataframes of the buoys look like this:
| Datetime | Hm0 | H1/3 |
|---|---|---|
| 2022-08-01 00:10:00 | 85.0 | 85.0 |
| 2022-08-01 00:20:00 | 90.0 | 90.0 |
| 2022-08-01 00:30:00 | 93.0 | 91.0 |
| 2022-08-01 00:40:00 | 92.0 | 91.0 |
| 2022-08-01 00:50:00 | 89.0 | 88.0 |
| ... | ... | ... |
I create sequences for the output buoy with a length of seq_length_future and input buoys with a length of seq_length_past (so the input and output sequences differ in length). All input buoys are put into input_data as Numpy array and the output buoy is put into output_data. The sequence lengths are:
seq_length_past = 42
seq_length_future = 24
The shapes of the input and output data after creating the sequences are:
sequenced_input_data.shape = (61117, 3, 42, 2)
sequenced_output_data.shape = (61117, 24, 2)
For the input data this results in a 4D array. Using Numpy reshape, I then make one long sequence out of the subsequences of the different input buoys. I do this, because I read that the LSTM can't handle 4D arrays for timeseries prediction. The output it will result in a 3D array.
I then normalize all the data using the MinMaxScaler(), with separate scalers for input and output data.
I then split the data into train and test data:
#split the data into test and train data
input_train, input_test, output_train, output_test = train_test_split(reshaped_input_data, reshaped_output_data, test_size=0.2, shuffle=False, random_state=42)
The shapes are then:
output_train.shape = (48893, 126, 2)
output_test.shape = (12224, 126, 2)
output_train.shape = (48893, 24, 2)
output_test.shape = (12224, 24, 2)
Training of the model
tf.keras.backend.clear_session()
model = Sequential()
model.add(Masking(mask_value= 0, input_shape=(input_train.shape[1], input_train.shape[2])))
model.add(LSTM(units=64, return_sequences=True, kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(0.1))
model.add(Dense(units=output_test.shape[2]))
model.compile(loss='mean_mean_error', optimizer=Adam(learning_rate=0.0001))
early_stopping = EarlyStopping(monitor="val_loss", verbose=2, mode='min', patience=3)
model.fit(x=input_train, y=output_train, epochs=10, validation_split=0.2, batch_size=32, callbacks=[early_stopping])
loss = model.evaluate(input_test, output_test)
print(f'Test loss: {loss}')
predictions = model.predict(input_test)
Can anybody explain to me what I'm doing wrong here :)? Hopefully I explained my code and problem well enough.
Hope to hear from you!
What I tried so far:
- Create a custom loss function, this works but since the answer in the link i provided works with different sequence lengths there must be something with my code.
- Using a mean_absolute_error instead of the squared, didn't work.
- Using a RepeatVector but then the predictions became flat
- I tried Flatten but the same error happens
PS: Would you recommend to use stateful = True or False?