I am doing a comparative study on a simple regression (one independent variable and one target variable) in two ways:- LinearRegression vs neural network (NN - Keras API). My sample data as follows:
x1 y
121.9114 121.856
121.856 121.4011
121.4011 121.3222
121.3222 121.9502
121.9502 122.0644
LinearRegression Code:
lr = LinearRegression()
lr.fit(X_train, y_train)
Note: LR model gives me RMSE 0.22 consistently in each subsequent run.
NN Code:
nn_model = models.Sequential()
nn_model.add(layers.Dense(2, input_dim=1, activation='relu'))
nn_model.add(layers.Dense(1))
nn_model.compile(optimizer='adam', loss='mse', metrics=['mae'])
nn_model.fit(X_train, y_train, epochs=40, batch_size=32)
Training Loss:
Epoch 1/40 539/539 [==============================] - 0s 808us/sample - loss: 16835.0895 -
mean_absolute_error: 129.5276
Epoch 2/40 539/539 [==============================] - 0s 163us/sample - loss: 16830.6868 -
mean_absolute_error: 129.5106
Epoch 3/40 539/539 [==============================] - 0s 204us/sample - loss: 16826.2856 -
mean_absolute_error: 129.4935
...........................................
...........................................
Epoch 39/40 539/539 [==============================] - 0s 187us/sample - loss: 16668.3582 -
mean_absolute_error: 128.8823
Epoch 40/40 539/539 [==============================] - 0s 168us/sample - loss: 16663.9828 -
mean_absolute_error: 128.8654
NN based solution gives me RMSE = 136.7476
Interestingly NN based solution gives me different RMSE in different run because training loss appears different in each run.
For example in first run as shown above loss starts with 16835 and final loss in 40th epoch is 16663. In this case model gives me RMSE=136.74
If i run the same code second time then loss starts with 16144 and final loss in 40th iteration is 5. In this case if RMSE comes to 7.3.
Sometimes i see RMSE as 0.22 also when training loss starts with 400 and ends (40th epoch) with 0.06.
This Keras behavior giving me hard time to understand if there is a problem with Keras API or i am doing something wrong or this problem statement is not suitable for Keras.
Could you please help me in understanding the issue and what could be the best way to stabilize the NN based solution ?
Some Additional Info:
- My training and test data is always fixed so no data is shuffled.
- number of records in train data = 539
- number of records in test data = 154
- tried MinMaxScaling also on train & test but doesn't bring stability in prediction.
there are multiple questions regarding the consistency/reproducibility of Keras. I have already answered that here a while ago and since then I have realized that some other modifications need to be done to achieve consistency:
According to Keras FAQ and this Kaggle experiment you CANNOT achieve consistency if you are using GPU processing. So they recommend you to set
CUDA_VISIBLE_DEVICES=""
and set the python hash generator to a fixed seed withPYTHONHASHSEED=0
(this must be done outside the script you're using Keras in).You also have to set some seeds:
1)numpy random seed
2)tensor flow random seed
3)python random seed
Additionally, you have to set two (if you have multiprocessing capabilities) arguments to
model.fit
. These ones are not often mentioned on the answers I've seen around:Make sure that you are training your model on a cpu. Later versions of
tensorflow-gpu
might be able to identify and select a GPU even when you setCUDA_VISIBLE_DEVICES=""
.