deep learning: loss and accuracy is low when data augmentation

1.3k views Asked by At

I am new on deep learning concept. I create a CNN model to predict my digit hand writing by MNIST data set and tensorflow and keras.

I don't know why my model can't predict 6 number image. So I decided to Data Augmentation for mnist dataset.I tried to implement Augmentation for mnist so this is codes:

(X_train, y_train), (X_test, y_test) = mnist.load_data()# Reshaping to format which CNN expects (batch, height, width, channels)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1).astype('float32')

image_size = X_train.shape[1]
input_size = image_size * image_size

X_train/=255
X_test/=255

number_of_classes = 10
y_train = np_utils.to_categorical(y_train, number_of_classes)
y_test = np_utils.to_categorical(y_test, number_of_classes)

batch_size = 128
epochs = 2

# create model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(X_train.shape[1], X_train.shape[2], 1), activation = tf.nn.relu))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation = tf.nn.relu))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation = tf.nn.relu))
model.add(keras.layers.Dense(50, activation = tf.nn.relu))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(number_of_classes, activation=tf.nn.softmax))


# Compile model
model.compile(
    loss='categorical_crossentropy', 
    optimizer=Adam(), 
    metrics=['accuracy']
)


datagen = ImageDataGenerator(
        featurewise_center=True,
        featurewise_std_normalization=True,
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        horizontal_flip=False)

datagen.fit(X_train.reshape(X_train.shape[0], 28, 28, 1))



for e in range(epochs):
    print('Epoch', e)
    batches = 0
    for x_batch, y_batch in datagen.flow(X_train,y_train,batch_size=batch_size):
          #x_batch = np.reshape(x_batch, [-1, image_size*image_size])
          model.fit(x_batch, y_batch)
          batches += 1
          if batches >= len(X_train) / 32:
                # we need to break the loop by hand because
                # the generator loops indefinitely
            break

# Score trained model.
scores = model.evaluate(X_test,
                        y_test,
                        batch_size=batch_size,
                        verbose=False)
print('Test loss:', scores[0])
print('Test accuracy: %0.1f%%' % (100 * scores[1]) )

In training loss and accuracy are: loss: 0.0917 - accuracy: 0.9688 but when I predict by test data loss and accuracy are so low:

Test loss: 2.3218765258789062
Test accuracy: 11.2%

Someone know what is a problem and how can I improve that?

2

There are 2 answers

0
Frightera On BEST ANSWER

That's called overfitting, you are memorizing your training data. You're doing pretty well with seen data but your network is unable to predict unseen data.

Generally speaking, there are 3 types of dataset splits:

  • Training
  • Validation
  • Test

With validation data, you tune your hyperparameters, and test with unseen data which is test split. Test splits do not get augmented. Because we want to represent real world data so users will not augment their data to get their prediction etc. Data augmention is a good way to show more variety of data to your neural network. It can increase your testing accuracy significantly if you use right. You could have used horizontal_flip = True to see if it will effect your test accuracy or not.

So turning back to your question, you need to regularize your model in order to prevent overfitting. That's what your dropout layer does actually, it is making the model less complex. Also I see you are training for only 2 epochs, which is not sufficient to fit your model to the data to get good predictions. You need to train longer as well as regularizing the model. Or you can make the model less complex, it is up to you.

Edit: I just realized that you set featurewise_center=True, featurewise_std_normalization=True, you are setting the input mean 0 of your training set. But your testing set has a mean of 0.5 as you re-scaled it by 1/255, you need to apply same preprocess (not augmention) techniques to your test set. That would also effect predictions.

2
John Stud On

Your augmentation does not reflect the test set. If you give a NN some data, it will, for the most part, do really well "learning" from it. You decided to give it some "augmented" data which is not found in the test set. So, your NN learned your fake data pretty well, and tuned its coefficients to do so, but, when you showed it "real" data, the NN's weights did very poorly predicting the "real" data.

Try a different technique. Synthetic data is usually never a great option.