I am using a primitive neural network on the mnist dataset, however my mode gets stuck at 42% of accuracy for the validation data.
The data is csv, with the format being: 60000 rows (for training data) and 785 columns, the first one being the label.
The following is the code to segment and convert the CSV data, representing the images (28x28):
import pandas as pd
import numpy as np
import tensorflow as tf
df = pd.read_csv('mnist_train.csv')
dff = pd.read_csv('mnist_test.csv')
#train set
label = np.array(df.iloc[:,0])
data = np.array(df.iloc[:,1:])
sep = []
for i in range(60000):
temp = []
for j in range(28):
temp.append(data[i,j*28:(j+1)*28])
sep.append(temp)
sep = np.array(sep)
for i in range(60000):
for j in range(28):
for k in range(28):
sep[i,j,k] = sep[i,j,k]/255
labels_array = []
for i in label:
if i==0:
labels_array.append([1,0,0,0,0,0,0,0,0,0])
if i==1:
labels_array.append([0,1,0,0,0,0,0,0,0,0])
if i==2:
labels_array.append([0,0,1,0,0,0,0,0,0,0])
if i==3:
labels_array.append([0,0,0,1,0,0,0,0,0,0])
if i==4:
labels_array.append([0,0,0,0,1,0,0,0,0,0])
if i==5:
labels_array.append([0,0,0,0,0,1,0,0,0,0])
if i==6:
labels_array.append([0,0,0,0,0,0,1,0,0,0])
if i==7:
labels_array.append([0,0,0,0,0,0,0,1,0,0])
if i==8:
labels_array.append([0,0,0,0,0,0,0,0,1,0])
if i==9:
labels_array.append([0,0,0,0,0,0,0,0,0,1])
labels_array = np.array(labels_array)
#train
label_t = np.array(dff.iloc[:,0])
data_t = np.array(dff.iloc[:,1:])
sep_t = []
for i in range(10000):
temp = []
for j in range(28):
temp.append(data_t[i,j*28:(j+1)*28])
sep_t.append(temp)
sep_t = np.array(sep_t)
for i in range(10000):
for j in range(28):
for k in range(28):
sep_t[i,j,k] = sep_t[i,j,k]/255
labels_array_t = []
for i in label_t:
if i==0:
labels_array_t.append([1,0,0,0,0,0,0,0,0,0])
if i==1:
labels_array_t.append([0,1,0,0,0,0,0,0,0,0])
if i==2:
labels_array_t.append([0,0,1,0,0,0,0,0,0,0])
if i==3:
labels_array_t.append([0,0,0,1,0,0,0,0,0,0])
if i==4:
labels_array_t.append([0,0,0,0,1,0,0,0,0,0])
if i==5:
labels_array_t.append([0,0,0,0,0,1,0,0,0,0])
if i==6:
labels_array_t.append([0,0,0,0,0,0,1,0,0,0])
if i==7:
labels_array_t.append([0,0,0,0,0,0,0,1,0,0])
if i==8:
labels_array_t.append([0,0,0,0,0,0,0,0,1,0])
if i==9:
labels_array_t.append([0,0,0,0,0,0,0,0,0,1])
labels_array_t = np.array(labels_array_t)
and the following is the learning network:
Dense = tf.keras.layers.Dense
fc_model = tf.keras.Sequential(
[
tf.keras.Input(shape=(28,28)),
tf.keras.layers.Flatten(),
Dense(128, activation='relu'),
Dense(32, activation='relu'),
Dense(10, activation='softmax')])
fc_model.compile(optimizer="Adam", loss="categorical_crossentropy", metrics=["accuracy"])
history = fc_model.fit(sep, labels_array, batch_size=128, validation_data=(sep_t, labels_array_t) ,epochs=35)
the following is the result I get:
Train on 60000 samples, validate on 10000 samples
Epoch 1/35
60000/60000 [==============================] - 2s 31us/sample - loss: 1.8819 - accuracy: 0.3539 - val_loss: 1.6867 - val_accuracy: 0.4068
Epoch 2/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.6392 - accuracy: 0.4126 - val_loss: 1.6407 - val_accuracy: 0.4098
Epoch 3/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.5969 - accuracy: 0.4224 - val_loss: 1.6202 - val_accuracy: 0.4196
Epoch 4/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.5735 - accuracy: 0.4291 - val_loss: 1.6158 - val_accuracy: 0.4220
Epoch 5/35
60000/60000 [==============================] - 1s 25us/sample - loss: 1.5561 - accuracy: 0.4324 - val_loss: 1.6089 - val_accuracy: 0.4229
Epoch 6/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.5423 - accuracy: 0.4377 - val_loss: 1.6074 - val_accuracy: 0.4181
Epoch 7/35
60000/60000 [==============================] - 2s 25us/sample - loss: 1.5309 - accuracy: 0.4416 - val_loss: 1.6053 - val_accuracy: 0.4226
Epoch 8/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.5207 - accuracy: 0.4435 - val_loss: 1.6019 - val_accuracy: 0.4252
Epoch 9/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.5111 - accuracy: 0.4480 - val_loss: 1.6015 - val_accuracy: 0.4233
Epoch 10/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.5020 - accuracy: 0.4517 - val_loss: 1.6038 - val_accuracy: 0.4186
Epoch 11/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4954 - accuracy: 0.4530 - val_loss: 1.6096 - val_accuracy: 0.4209
Epoch 12/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4885 - accuracy: 0.4554 - val_loss: 1.6003 - val_accuracy: 0.4278
Epoch 13/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4813 - accuracy: 0.4573 - val_loss: 1.6072 - val_accuracy: 0.4221
Epoch 14/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4749 - accuracy: 0.4598 - val_loss: 1.6105 - val_accuracy: 0.4242
Epoch 15/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4693 - accuracy: 0.4616 - val_loss: 1.6160 - val_accuracy: 0.4213
Epoch 16/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4632 - accuracy: 0.4626 - val_loss: 1.6149 - val_accuracy: 0.4266
Epoch 17/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4580 - accuracy: 0.4642 - val_loss: 1.6145 - val_accuracy: 0.4267
Epoch 18/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4532 - accuracy: 0.4656 - val_loss: 1.6169 - val_accuracy: 0.4330
Epoch 19/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4479 - accuracy: 0.4683 - val_loss: 1.6198 - val_accuracy: 0.4236
Epoch 20/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4436 - accuracy: 0.4693 - val_loss: 1.6246 - val_accuracy: 0.4264
Epoch 21/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4389 - accuracy: 0.4713 - val_loss: 1.6300 - val_accuracy: 0.4254
Epoch 22/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4350 - accuracy: 0.4730 - val_loss: 1.6296 - val_accuracy: 0.4258
Epoch 23/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4328 - accuracy: 0.4727 - val_loss: 1.6279 - val_accuracy: 0.4257
Epoch 24/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4282 - accuracy: 0.4742 - val_loss: 1.6327 - val_accuracy: 0.4209
Epoch 25/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4242 - accuracy: 0.4745 - val_loss: 1.6387 - val_accuracy: 0.4256
Epoch 26/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4210 - accuracy: 0.4765 - val_loss: 1.6418 - val_accuracy: 0.4240
Epoch 27/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4189 - accuracy: 0.4773 - val_loss: 1.6438 - val_accuracy: 0.4237
Epoch 28/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4151 - accuracy: 0.4781 - val_loss: 1.6526 - val_accuracy: 0.4184
Epoch 29/35
60000/60000 [==============================] - 1s 25us/sample - loss: 1.4129 - accuracy: 0.4788 - val_loss: 1.6572 - val_accuracy: 0.4190
Epoch 30/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4097 - accuracy: 0.4801 - val_loss: 1.6535 - val_accuracy: 0.4225
Epoch 31/35
60000/60000 [==============================] - 1s 24us/sample - loss: 1.4070 - accuracy: 0.4795 - val_loss: 1.6689 - val_accuracy: 0.4188
Epoch 32/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4053 - accuracy: 0.4809 - val_loss: 1.6663 - val_accuracy: 0.4194
Epoch 33/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4029 - accuracy: 0.4831 - val_loss: 1.6618 - val_accuracy: 0.4220
Epoch 34/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.4000 - accuracy: 0.4832 - val_loss: 1.6603 - val_accuracy: 0.4270
Epoch 35/35
60000/60000 [==============================] - 1s 23us/sample - loss: 1.3979 - accuracy: 0.4845 - val_loss: 1.6741 - val_accuracy: 0.4195
would this only be because of the optimizer? I tried SGD but to avail!
TLDR; Change loss to
categorical_crossentropy
The optimizer is not an issue here.
The immediate issue that I can see is the fact that for a multi-class classification problem you are using the loss as
mse
. Please change it tocategorical_crossentropy
. That should get you better numbers. Also, don't forget to removemse
from metrics as well.For future reference, you can use the following table for best practices. It would be even better if you spend time researching why each of these activations and loss functions are used for specific problems mathematically.
Note: Another side note, even though this does not impact any performance, you don't need to convert the labels to the one-hot vectors.
Instead, you can use the original
label
orlabel_t
DIRECTLY as youry_train
and instead of using the losscategorical_crossentropy
you can change it tosparse_categorical_crossentropy
EDIT:
Based on your comments, and the testing I did on another mnist dataset, please try the following -