I'm finetuning Keras' Resnet pre trained on imagenet data to work on a specific classification with another dataset of images. My model is structured as follows: Resnet takes the inputs, and on the top of Resnet I added my own classifier. During all the experiments I tried, the model either underfitted or overfitted.
I mainly tried two approaches:
block a certain number
nof layers towards the input, not to let them be updated during the training. In particular, Resnet has 175 layers, and I tried with
n= 0, 10, 30, 50, 80, 175. In all these cases, the model underfits, obtaining an accuracy over the training set at most equal to 0.75, and on the validation at most 0.51.
block all the batch normalization layers, plus some
nlayers at the beginning (as before), with
n= 0, 10, 30, 50. In these cases, the model overfits, obtaining more than 0.95 of accuracy on the training set, but around 0.5 on the validation.
Please note that changing from Resnet to InceptionV3, and blocking 50 layers, I obtain more than 0.95 of accuracy on both validation and test sets.
Here is the main part of my code:
inc_model = ResNet50(weights='imagenet', include_top=False, input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)) print("number of layers:", len(inc_model.layers)) #175 #Adding custom Layers x = inc_model.output x = GlobalAveragePooling2D()(x) x = Dense(1024, activation="relu")(x) x = Dropout(0.5)(x) x = Dense(512, activation="relu")(x) predictions = Dense(2, activation="softmax")(x) model_ = Model(inputs=inc_model.input, outputs=predictions) # fine tuning 1 for layer in inc_model.layers[:30]: layer.trainable = False # fine tuning 2 for layer in inc_model.layers: if 'bn' in layer.name: layer.trainable = False # compile the model model_.compile(optimizer=SGD(lr=0.0001, momentum=0.9) , loss='categorical_crossentropy' , metrics=['accuracy']) checkpointer = ModelCheckpoint(filepath='weights.best.inc.male.resnet.hdf5', verbose=1, save_best_only=True) hist = model_.fit_generator(train_generator , validation_data = (x_valid, y_valid) , steps_per_epoch= TRAINING_SAMPLES/BATCH_SIZE , epochs= NUM_EPOCHS , callbacks=[checkpointer] , verbose=1 )
Can anyone suggest how to find a stable solution that learns something but doesn't overfit?
EDIT: the output of the training phase is something like that:
Epoch 1/20 625/625 [==============================] - 2473s 4s/step - loss: 0.6048 - acc: 0.6691 - val_loss: 8.0590 - val_acc: 0.5000 Epoch 00001: val_loss improved from inf to 8.05905, saving model to weights.best.inc.male.resnet.hdf5 Epoch 2/20 625/625 [==============================] - 2432s 4s/step - loss: 0.4445 - acc: 0.7923 - val_loss: 8.0590 - val_acc: 0.5000 Epoch 00002: val_loss did not improve from 8.05905 Epoch 3/20 625/625 [==============================] - 2443s 4s/step - loss: 0.3730 - acc: 0.8407 - val_loss: 8.0590 - val_acc: 0.5000 Epoch 00003: val_loss did not improve from 8.05905
and so on.. Every time there's no improvements on the validation