I am new on deep learning concept. I create a CNN model to predict my digit hand writing by MNIST data set and tensorflow and keras.
I don't know why my model can't predict 6 number image. So I decided to Data Augmentation for mnist dataset.I tried to implement Augmentation for mnist so this is codes:
(X_train, y_train), (X_test, y_test) = mnist.load_data()# Reshaping to format which CNN expects (batch, height, width, channels)
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1).astype('float32')
image_size = X_train.shape[1]
input_size = image_size * image_size
X_train/=255
X_test/=255
number_of_classes = 10
y_train = np_utils.to_categorical(y_train, number_of_classes)
y_test = np_utils.to_categorical(y_test, number_of_classes)
batch_size = 128
epochs = 2
# create model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(X_train.shape[1], X_train.shape[2], 1), activation = tf.nn.relu))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation = tf.nn.relu))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation = tf.nn.relu))
model.add(keras.layers.Dense(50, activation = tf.nn.relu))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(number_of_classes, activation=tf.nn.softmax))
# Compile model
model.compile(
loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy']
)
datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=False)
datagen.fit(X_train.reshape(X_train.shape[0], 28, 28, 1))
for e in range(epochs):
print('Epoch', e)
batches = 0
for x_batch, y_batch in datagen.flow(X_train,y_train,batch_size=batch_size):
#x_batch = np.reshape(x_batch, [-1, image_size*image_size])
model.fit(x_batch, y_batch)
batches += 1
if batches >= len(X_train) / 32:
# we need to break the loop by hand because
# the generator loops indefinitely
break
# Score trained model.
scores = model.evaluate(X_test,
y_test,
batch_size=batch_size,
verbose=False)
print('Test loss:', scores[0])
print('Test accuracy: %0.1f%%' % (100 * scores[1]) )
In training loss and accuracy are: loss: 0.0917 - accuracy: 0.9688
but when I predict by test data loss and accuracy are so low:
Test loss: 2.3218765258789062
Test accuracy: 11.2%
Someone know what is a problem and how can I improve that?
That's called overfitting, you are memorizing your training data. You're doing pretty well with seen data but your network is unable to predict unseen data.
Generally speaking, there are 3 types of dataset splits:
With validation data, you tune your hyperparameters, and test with unseen data which is test split. Test splits do not get augmented. Because we want to represent real world data so users will not augment their data to get their prediction etc. Data augmention is a good way to show more variety of data to your neural network. It can increase your testing accuracy significantly if you use right. You could have used
horizontal_flip = True
to see if it will effect your test accuracy or not.So turning back to your question, you need to regularize your model in order to prevent overfitting. That's what your dropout layer does actually, it is making the model less complex. Also I see you are training for only 2 epochs, which is not sufficient to fit your model to the data to get good predictions. You need to train longer as well as regularizing the model. Or you can make the model less complex, it is up to you.
Edit: I just realized that you set
featurewise_center=True,
featurewise_std_normalization=True
, you are setting the input mean 0 of your training set. But your testing set has a mean of 0.5 as you re-scaled it by 1/255, you need to apply same preprocess (not augmention) techniques to your test set. That would also effect predictions.