I’m trying to solve a problem in my CNN model. I'm using a dataset that is structured as follow: My data is organized as follow:

  • train
    • parasitized
    • uninfected
  • test
    • parasitized
    • uninfected
  • validation
    • parasitized
    • uninfected

My dataset is too large and i'm using ImageDataGenerator to preprocess the images and also upload them in batches (reduce computational cost). In a first i configured the ImageDataGenerator as follows:

from keras.preprocessing.image import ImageDataGenerator

#Define a ImageDataGenerator for each dataset. 
#This augmentation process is only to rescale each imagem to 1/255

datagen_train = ImageDataGenerator(rescale=1./255) #rescale=1./255
datagen_test = ImageDataGenerator(rescale=1./255)
datagen_valid = ImageDataGenerator(rescale=1./255)

#Define a batch_size parameter
batch_size=32

# Here .flow_from_directory is used to transform 
train_generator = datagen_train.flow_from_directory(
    'content/cell_images/train', #Train folder path 
    target_size=(150,150), #all images will be resized to 150x150
    batch_size=batch_size,
    class_mode='categorical') # We use categorical_crossentropy loss, 
                              # we need categorical labels


test_generator = datagen_test.flow_from_directory(
    'content/cell_images/test', #Test folder path
    target_size=(150,150), #all images will be resized to 150x150
    batch_size=batch_size,
    class_mode='categorical')         

valid_generator = datagen_valid.flow_from_directory(
    'content/cell_images/valid',
     target_size=(150,150),
     batch_size=32,
     class_mode='categorical') 

To fit the model it was used fit_generator and a checkpointer to save the best weights based on the validation_accuracy:

from keras.callbacks import ModelCheckpoint

# Define epochs number
epochs = 10

# Create a checkpointer to save only the best params
checkpointer = ModelCheckpoint(filepath='cnn_model.weights.best.hdf5', 
                          verbose=1, save_best_only=True)

model.fit_generator(train_generator,
               steps_per_epoch=train_generator.samples//batch_size,
               epochs=epochs,
               callbacks=[checkpointer],
               validation_data=valid_generator,
               validation_steps=valid_generator.samples//batch_size)

And finally, the best weights was loaded to the model. The model was evaluated using test_set:

# load the weights that yielded the best validation accuracy
model.load_weights('cnn_model.weights.best.hdf5')

#evaluate and print test accuracy
score = model.evaluate_generator(test_generator,
       test_generator.samples//batch_size)
print('\n', 'Test accuracy:', score[1])

But, here is my problem: each time i run only the model.evaluate_generator without train the model again (i.e. keeping the same weights), it returns different accuracy scores.

I've been looking for a solution, reading a lot of posts to get some insights and recently i got some advance.

Recently, i discovered based on this post that if i set Shuffle=True and batch_size=1 in test_generator:

test_generator = datagen_test.flow_from_directory(
    'content/cell_images/test', #Test folder path
    target_size=(150,150), #all images will be resized to 150x150
    batch_size=1,
    class_mode='categorical',
    shuffle=False)`

and steps = test_generator.samples in test_generator: score = model.evaluate_generator(test_generator, test_generator.samples)

the values doesn't change any more.

I was investigating the effect of rescale 1./255 based on this post. For this, i used callbacks with a checkpointer to save the weights only for the best validation socore. After, i loaded the best weights to the model and evaluated using model.evaluate_generator, as mentioned above. To check the score consistency i also use the validation score to check if the values returned by the callbacks for the best weights is the same returned by the evaluate_generator. Before run the evaluate_generator with validation_set i used the same params of the test set:

valid_generator = datagen_valid.flow_from_directory(
    'content/cell_images/valid',
    target_size=(150,150),
    batch_size=1,
    class_mode='categorical',
    shuffle=False)


#evaluate and print test accuracy
score = model.evaluate_generator(valid_generator, 
        valid_generator.samples)
print('\n', 'Valid accuracy:', score[1])

#evaluate and print test accuracy
score = model.evaluate_generator(test_generator, 
test_generator.samples)
print('\n', 'Test accuracy:', score[1])

Curiously i noticed that:

When i don't use the rescale (1./255):

datagen_train = ImageDataGenerator()

datagen_test = ImageDataGenerator()

datagen_valid = ImageDataGenerator()

the validation_score displayed by the callbacks (0.5) it's exacly the same obtained from model.evaluate_generator (0.5); Also, the test set returns an accuracy score = 0.5.

When i use the rescale (1./255):

datagen_train = ImageDataGenerator(rescale=1./255)

datagen_test = ImageDataGenerator(rescale=1./255)

datagen_valid = ImageDataGenerator(rescale=1./255)

the difference between validation_score displayed by the callbacks (0.9515):

Epoch 7/10
688/688 [==============================] - 67s 97ms/step - loss: 
0.2017 - acc: 0.9496 - val_loss: 0.1767 - val_acc: 0.9515

Epoch 00007: val_loss improved from 0.19304 to 0.17671, saving model 
to cnn_model.weights.best.hdf5

and the score obtained from model.evaluate_generator (Valid accuracy: 0.9466618287373004) it's very tiny; Using test set - Test accuracy: 0.9078374455732946

Based on this small difference between validation scores, could i infer that the evaluate_generator is working corrreclty? And, could i infer that the accuracy score on the test_set is also correctly ? Or there is another approach to solve this problem?

I'm frustating with this problem. Sorry for the long post, i'm trying to be the more didactic i can.

Thanks!

0 Answers