Keras with Theano: Loss decrease but accuracy not changing

1.7k views Asked by At

This is my code. I tried to build a VGG 11 layers network, with a mix of ReLu and ELu activation and many regularizations on kernels and activities. The result is really confusing: The code is at 10th epoch. My loss on both train and val have decreased from 2000 to 1.5, but my acc on both train and val remained the same at 50%. Can somebody explain to me?

# VGG 11
from keras.regularizers import l2
from keras.layers.advanced_activations import ELU
from keras.optimizers import Adam
model = Sequential()

model.add(Conv2D(64, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          input_shape=(1, 96, 96), activation='relu'))
model.add(Conv2D(64, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001),activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(Conv2D(128, (3, 3), kernel_initializer='he_normal',     
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(256, (3, 3), kernel_initializer='he_normal',     
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(Conv2D(256, (3, 3), kernel_initializer='he_normal',     
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(512, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(Conv2D(512, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001), 
          activation='relu'))
model.add(Conv2D(512, (3, 3), kernel_initializer='he_normal', 
          kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.0001),     
          activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# convert convolutional filters to flat so they can be feed to fully connected layers
model.add(Flatten())

model.add(Dense(2048, kernel_initializer='he_normal',
               kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.01)))
model.add(ELU(alpha=1.0))
model.add(Dropout(0.5))

model.add(Dense(1024, kernel_initializer='he_normal',
               kernel_regularizer=l2(0.0001), activity_regularizer=l2(0.01)))
model.add(ELU(alpha=1.0))
model.add(Dropout(0.5))

model.add(Dense(2))
model.add(Activation('softmax'))

adammo = Adam(lr=0.0008, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=adammo, metrics=['accuracy'])
hist = model.fit(X_train, y_train, batch_size=48, epochs=20, verbose=1, validation_data=(X_val, y_val))
1

There are 1 answers

5
modesitt On

This is not a defect, in fact, it is entirely possible!

Categorical cross entropy loss does not require that accuracy go up with the loss decreasing. This is not a bug in keras or theano, but rather a network or data problem.

This network structure is probably over-complicated for what you might be trying to do. You should remove some of your regularization, use only ReLu, use less layers, use the standard adam optimizer, a larger batch, etc. Try first using one of keras' default models like VGG16,

If you want to see their implementation to edit it for a different VGG11 like structure. It is here:

def VGG_16(weights_path=None):
    model = Sequential()
    model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(64, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1000, activation='softmax'))

    if weights_path:
        model.load_weights(weights_path)

    return model

You can see it is much more simple. It only uses rely (which has gotten popular these days) has no regularization, different convolution structure, etc. Modify that to your needs!