MAE not reducing in CNN-BiLSTM

23 views Asked by At

I want to do time series forecasting of length of day (LOD) using CNN-BiLSTM model. I need to predict 7 days in the future for each timestep. I have tried detrending the data using polynomial fit and then least squares fit. After that I put the residuals into the models for forecasting and calculated error after retrending the predictions. I have tried several variations of lstm models but the mean absolute error doesn't seem to reduce less than 0.4. The whole approach was taken from a research paper wherein they have compared prediction results using 7, 14, 28 timesteps with a CNN model. The mean absolute error they had was around 0.01.

Please suggest something that would reduce the error or point out where I am going wrong.

DETRENDING STEP-1: DETRENDING USING POLYNOMIAL CURVE FIT

x_d = dataset.index.values
y_d = dataset['LOD']

plt.figure(figsize = (20, 15))
for i in range(1, 10):
    # get the polynomial coefficients
    y_est = np.polyfit(x_d, y_d, i)

Different polynomial fits:

Different polynomial fits

chosen_order = 3
coefficients = np.polyfit(x_d, y_d, chosen_order)
polynomial_fit = np.polyval(coefficients, x_d)
detrended_data = y_d - polynomial_fit

Detrended data using polynomial fit:

Detrended data using polynomial fit

DETRENDING STEP-2: DETRENDING USING LEAST SQUARE FIT

The least square fitting was done according to the following equation: fLODR(t) = α0 + α1t + A1 sin(w1t + φ1) + A2 sin(w2t + φ2) + A3 sin(w3t + φ3) where α0 is the bias, α1 is the drift of the linear term, w1 = 2π/365.24 , w2 = 2π/182.62, and w3 = 2π · 1.19 · 10−04.

from scipy.optimize import curve_fit

def model_equation(t, alpha_0, alpha_1, A1, A2, A3, phi1, phi2, phi3):
    w1 = 2 * np.pi / 365.24
    w2 = 2 * np.pi / 182.62
    w3 = 2 * np.pi * 1.19e-4
    return alpha_0 + alpha_1 * t + \
           A1 * np.sin(w1 * t + phi1) + \
           A2 * np.sin(w2 * t + phi2) + \
           A3 * np.sin(w3 * t + phi3)

# Sample time values and LODR data
t_values = detrended_data.index.values  # Replace with your time values
LODR_values = np.array(detrended_data)  # Replace with your LODR data

# Initial guesses for the parameters
initial_guess = (1.0, 0.1, 0.5, 0.5, 0.5, 0.0, 0.0, 0.0)

# Perform curve fitting using least squares
popt, pcov = curve_fit(model_equation, t_values, LODR_values, p0=initial_guess)

# Extract estimated parameters
alpha_0, alpha_1, A1, A2, A3, phi1, phi2, phi3 = popt

# Calculate fitted values (LODR_fit) and residuals
LODR_fit = model_equation(t_values, *popt)
residuals = LODR_values - LODR_fit

LOD, trend terms, LODR residual series:

LOD, trend terms, LODR residual series.

FURTHER STEPS

df1 = residuals

from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler(feature_range=(0,1))
df1 = scaler.fit_transform(np.array(df1).reshape(-1,1))

training_size = 7000
test_size = len(df1) - training_size
train_data, test_data = df1[0:training_size,:], df1[training_size:len(df1),:1]

def split_sequences(sequences, time_step, n_steps_out):
 X, y = list(), list()
 for i in range(len(sequences)):
  end_ix = i + time_step
  out_end_ix = end_ix + n_steps_out - 1
  if out_end_ix > len(sequences):
   break
  seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix-1:out_end_ix, -1]
  X.append(seq_x)
  y.append(seq_y)
 return np.array(X), np.array(y)

time_step = 7
n_steps_out = 7

X_train, y_train = split_sequences(train_data, time_step, n_steps_out)
X_test, ytest = split_sequences(test_data, time_step, n_steps_out)

which gives us:

SHAPES:
y_train.shape: (6988, 7)
X_train.shape: (6988, 7, 1)
ytest.shape: (1024, 7)
X_test.shape: (1024, 7, 1)

MODEL STRUCTURE

model = Sequential()
model.add(Conv1D(filters=64, kernel_size=4, activation='relu', input_shape=(time_step, 1)))
model.add(Conv1D(filters=64, kernel_size=4, activation='relu'))
# model.add(MaxPooling1D(pool_size=2, padding='valid'))
model.add(Flatten())
model.add(RepeatVector(1))
model.add(Bidirectional(LSTM(200, return_sequences=True)))
model.add(Bidirectional(LSTM(100, return_sequences=True)))
model.add(Bidirectional(LSTM(50)))
model.add(Dense(25, activation='relu'))
model.add(Dense(7, activation='linear'))
model.compile(optimizer='adam', loss='mean_absolute_error', metrics=[MeanAbsoluteError()])

early_stopping = EarlyStopping(monitor='mean_absolute_error', patience=10, restore_best_weights=True)
history = model.fit(X_train, y_train, validation_data = (X_test, ytest), epochs=500, batch_size=256, verbose=1, callbacks=[early_stopping])

Training and validation loss:

Training and validation loss

RETRENDING

train_predict = model.predict(X_train)
test_predict = model.predict(X_test)

# Transform predicted data back to original form
train_predict = scaler.inverse_transform(train_predict)
test_predict = scaler.inverse_transform(test_predict)

polynomial_fit = polynomial_fit.reshape(-1,1)
LODR_fit = LODR_fit.reshape(-1,1)

# Retrending predictions
train_predict = train_predict + polynomial_fit[:len(train_predict)] + LODR_fit[:len(train_predict)]
test_predict = test_predict + polynomial_fit[-len(test_predict):] + LODR_fit[-len(test_predict):]
SHAPES:
train_predict.shape:  (6988, 7)
test_predict.shape:  (1024, 7)
polynomial_fit.shape:  (8036, 1)
LODR_fit.shape:  (8036, 1)

ERROR CALCULATION

from sklearn.metrics import mean_absolute_error

# Train data MAE
train_mae = mean_absolute_error(y_train, train_predict)
print('TRAIN DATA MAE:', train_mae)

# Test data MAE
test_mae = mean_absolute_error(ytest, test_predict)
print('TEST DATA MAE:', test_mae)
OUTPUT:
TRAIN DATA MAE: 0.518224508907335
TEST DATA MAE: 0.42549170253419616
errors = {}

for day in range(1, 8):
    absolute_error = abs(ytest[:, day - 1] - test_predict[:, day - 1]).mean()
    errors[f"Day {day}"] = absolute_error

# Display errors
for day, error in errors.items():
    print(f"{day}: {error:.4f}")
OUTPUT:
Day 1: 0.4361
Day 2: 0.4357
Day 3: 0.4318
Day 4: 0.4286
Day 5: 0.4239
Day 6: 0.4154
Day 7: 0.4069

The above errors should be much less, around 0.01.

0

There are 0 answers