I want to do time series forecasting of length of day (LOD) using CNN-BiLSTM model. I need to predict 7 days in the future for each timestep. I have tried detrending the data using polynomial fit and then least squares fit. After that I put the residuals into the models for forecasting and calculated error after retrending the predictions. I have tried several variations of lstm models but the mean absolute error doesn't seem to reduce less than 0.4. The whole approach was taken from a research paper wherein they have compared prediction results using 7, 14, 28 timesteps with a CNN model. The mean absolute error they had was around 0.01.
Please suggest something that would reduce the error or point out where I am going wrong.
DETRENDING STEP-1: DETRENDING USING POLYNOMIAL CURVE FIT
x_d = dataset.index.values
y_d = dataset['LOD']
plt.figure(figsize = (20, 15))
for i in range(1, 10):
# get the polynomial coefficients
y_est = np.polyfit(x_d, y_d, i)
Different polynomial fits:
chosen_order = 3
coefficients = np.polyfit(x_d, y_d, chosen_order)
polynomial_fit = np.polyval(coefficients, x_d)
detrended_data = y_d - polynomial_fit
Detrended data using polynomial fit:
DETRENDING STEP-2: DETRENDING USING LEAST SQUARE FIT
The least square fitting was done according to the following equation: fLODR(t) = α0 + α1t + A1 sin(w1t + φ1) + A2 sin(w2t + φ2) + A3 sin(w3t + φ3) where α0 is the bias, α1 is the drift of the linear term, w1 = 2π/365.24 , w2 = 2π/182.62, and w3 = 2π · 1.19 · 10−04.
from scipy.optimize import curve_fit
def model_equation(t, alpha_0, alpha_1, A1, A2, A3, phi1, phi2, phi3):
w1 = 2 * np.pi / 365.24
w2 = 2 * np.pi / 182.62
w3 = 2 * np.pi * 1.19e-4
return alpha_0 + alpha_1 * t + \
A1 * np.sin(w1 * t + phi1) + \
A2 * np.sin(w2 * t + phi2) + \
A3 * np.sin(w3 * t + phi3)
# Sample time values and LODR data
t_values = detrended_data.index.values # Replace with your time values
LODR_values = np.array(detrended_data) # Replace with your LODR data
# Initial guesses for the parameters
initial_guess = (1.0, 0.1, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0)
# Perform curve fitting using least squares
popt, pcov = curve_fit(model_equation, t_values, LODR_values, p0=initial_guess)
# Extract estimated parameters
alpha_0, alpha_1, A1, A2, A3, phi1, phi2, phi3 = popt
# Calculate fitted values (LODR_fit) and residuals
LODR_fit = model_equation(t_values, *popt)
residuals = LODR_values - LODR_fit
LOD, trend terms, LODR residual series:
FURTHER STEPS
df1 = residuals
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler(feature_range=(0,1))
df1 = scaler.fit_transform(np.array(df1).reshape(-1,1))
training_size = 7000
test_size = len(df1) - training_size
train_data, test_data = df1[0:training_size,:], df1[training_size:len(df1),:1]
def split_sequences(sequences, time_step, n_steps_out):
X, y = list(), list()
for i in range(len(sequences)):
end_ix = i + time_step
out_end_ix = end_ix + n_steps_out - 1
if out_end_ix > len(sequences):
break
seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix-1:out_end_ix, -1]
X.append(seq_x)
y.append(seq_y)
return np.array(X), np.array(y)
time_step = 7
n_steps_out = 7
X_train, y_train = split_sequences(train_data, time_step, n_steps_out)
X_test, ytest = split_sequences(test_data, time_step, n_steps_out)
which gives us:
SHAPES:
y_train.shape: (6988, 7)
X_train.shape: (6988, 7, 1)
ytest.shape: (1024, 7)
X_test.shape: (1024, 7, 1)
MODEL STRUCTURE
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=4, activation='relu', input_shape=(time_step, 1)))
model.add(Conv1D(filters=64, kernel_size=4, activation='relu'))
# model.add(MaxPooling1D(pool_size=2, padding='valid'))
model.add(Flatten())
model.add(RepeatVector(1))
model.add(Bidirectional(LSTM(200, return_sequences=True)))
model.add(Bidirectional(LSTM(100, return_sequences=True)))
model.add(Bidirectional(LSTM(50)))
model.add(Dense(25, activation='relu'))
model.add(Dense(7, activation='linear'))
model.compile(optimizer='adam', loss='mean_absolute_error', metrics=[MeanAbsoluteError()])
early_stopping = EarlyStopping(monitor='mean_absolute_error', patience=10, restore_best_weights=True)
history = model.fit(X_train, y_train, validation_data = (X_test, ytest), epochs=500, batch_size=256, verbose=1, callbacks=[early_stopping])
Training and validation loss:
RETRENDING
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)
# Transform predicted data back to original form
train_predict = scaler.inverse_transform(train_predict)
test_predict = scaler.inverse_transform(test_predict)
polynomial_fit = polynomial_fit.reshape(-1,1)
LODR_fit = LODR_fit.reshape(-1,1)
# Retrending predictions
train_predict = train_predict + polynomial_fit[:len(train_predict)] + LODR_fit[:len(train_predict)]
test_predict = test_predict + polynomial_fit[-len(test_predict):] + LODR_fit[-len(test_predict):]
SHAPES:
train_predict.shape: (6988, 7)
test_predict.shape: (1024, 7)
polynomial_fit.shape: (8036, 1)
LODR_fit.shape: (8036, 1)
ERROR CALCULATION
from sklearn.metrics import mean_absolute_error
# Train data MAE
train_mae = mean_absolute_error(y_train, train_predict)
print('TRAIN DATA MAE:', train_mae)
# Test data MAE
test_mae = mean_absolute_error(ytest, test_predict)
print('TEST DATA MAE:', test_mae)
OUTPUT:
TRAIN DATA MAE: 0.518224508907335
TEST DATA MAE: 0.42549170253419616
errors = {}
for day in range(1, 8):
absolute_error = abs(ytest[:, day - 1] - test_predict[:, day - 1]).mean()
errors[f"Day {day}"] = absolute_error
# Display errors
for day, error in errors.items():
print(f"{day}: {error:.4f}")
OUTPUT:
Day 1: 0.4361
Day 2: 0.4357
Day 3: 0.4318
Day 4: 0.4286
Day 5: 0.4239
Day 6: 0.4154
Day 7: 0.4069
The above errors should be much less, around 0.01.