I do the feature engineering about the spot and USD-M future of BTCUSDT.Then outer join two tables based on the timestamps. However, there are too many NaN values, because of the different transaction time of spot and future markets. So before the first layer in the LSTM model, I add a masking function in order to ignore rows of NaN values. I fill the NaN values after MIinMaxScaler() function rather than padding function (pre and post).
# normalize features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
# fill NaN
df_scaled = pd.DataFrame(scaled)
df_scaled.fillna(0.9191, inplace=True)
scaled_fill = df_scaled.values
print(scaled_fill)
But I find a strange thing, different masking values impact the val_loss!!!
# design network
model = Sequential()
# Add a Masking layer before the LSTM layer ***** mask_value -> float
model.add(Masking(mask_value=0.9191, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
....
# fit network
history = model.fit(
train_X,
train_y,
epochs=100,
batch_size=50,
validation_data=(validation_X, validation_y),
verbose=2,
shuffle=False,
callbacks=[tensorboard_callback, early_stopping, model_checkpoint]
)
When I fill NaN values using 0.00009 and 9999, the val_loss are 0.37 and 78389 (average). I already ensure the masking value doesn't exist in the scaled.
In fact, I think the masking function help igonre the rows having masking values. Why different masking values impact the val_loss???