Addressing Saddle Points in Keras Model Training

42 views Asked by At

My keras model seems to have to hit a saddle point in it's training. Of course this is just an assumption; I'm not really sure. In any case, the loss stops at .0025 and nothing I have tried has worked to reduce the loss any further.

What I have tried so far is:

  1. Using Adam and RMSProp with and without cyclical learning rates. The Results are that the loss starts and stays .0989. The learning rates for cyclical learning where .001 to .1.

  2. After 4 or 5 epochs of not moving I tried SGD instead and the loss steadily declined too .0025. This is where the learning rate stalls out. After about 5 epochs of not changing I tried using SGD with cyclical learning enabled hoping it would decrease but I get the same result.

  3. I have tried increasing network capacity (as well as decreasing) thinking maybe the network hit it's learning limitations. I increased all 4 dense layers to 4096. That didn't change anything.

  4. I've tried different batch sizes.

The most epochs I have trained the network for is 7. However, for 6 of those epochs the loss or validation loss do not change. Do I need to train for more epochs or could it be that .0025 is not a saddle point but is the global minimum for my dataset? I would think there is more room for it to improve. I tested the predictions of the network at .0025 and they aren't that great.

Any advice on how to continue? My code is below.

For starters my keras model is similar in style to VGG-16:

# imports 
pip install -q -U tensorflow_addons
import tensorflow_addons as tfa
import tensorflow as tf
from tensorflow import keras
from keras import layers

def get_model(input_shape):
    input = keras.input(shape=input_shape)
    x = layers.Conv2D(filters=64, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.Conv2D(filters=64, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.MaxPooling2D(pool_size=(2, 2) strides=none, paddings="same")(x)

    x = layers.Conv2D(filters=128, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.Conv2D(filters=128, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.MaxPooling2D(pool_size=(2, 2) strides=none, paddings="same")(x)

    x = layers.Conv2D(filters=256, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.Conv2D(filters=256, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.Conv2D(filters=256, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.Conv2D(filters=256, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.MaxPooling2D(pool_size=(2, 2) strides=none, paddings="same")(x)

    x = layers.Conv2D(filters=512, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.Conv2D(filters=512, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.Conv2D(filters=512, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.Conv2D(filters=512, kernel_size= (3, 3), activation='relu', paddings="same")(input)
    x = layers.MaxPooling2D(pool_size=(2, 2) strides=none, paddings="same")(x)

    x = layers.Flatten()(x)
    x = layers.Dense(4096, activation='relu')(x)
    x = layers.Dense(2048, activation='relu')(x)
    x = layers.Dense(1024, activation='relu')(x)
    x = layers.Dense(512, activation='relu')(x)

    output = layers.Dense(9, activation='sigmoid')(x)
    return keras.models.Model(inputs=input, outputs=output)

# define learning rate range
lr_range = [.001, .1]
epochs = 100
batch_size = 32
# based on https://www.tensorflow.org/addons/tutorials/optimizers_cyclicallearningrate
steps_per_epoch = len(training_data)/batch_size
clr = tfa.optimizers.CyclicalLearningRate(initial_learning_rate=lr_range[0],
    maximal_learning_rate=lr_range[1],
    scale_fn=lambda x: 1/(2.**(x-1)),
    step_size=2 * steps_per_epoch
)
optimizer = tf.keras.optimizers.Adam(clr)

model = get_model((224, 224, 3))
model.compile(optimzer=optimzer, loss='mean_squared_error')
# used tf.dataset objects for model input
model.fit(train_ds, validation_data=valid_ds, batch_size=batch_size, epochs=epochs)
0

There are 0 answers