I am working on a deep learning (CNN + AEs) approach on facial images.
I have
an input layer of
112*112*3
of facial images3 convolution + max pooling + ReLU
2 layers of fully connected with 512 neurons with 50% dropout to avoid overfitting and last output layer with 10 neurons since I have 10 classes.
also used reduce mean of softmax cross entropy and also L2.
For training I divided my dataset to 3 groups of:
- 60% for training
- 20% for validation
- 20% for evaluation
The problem is after few epochs the validation error rate stay fixed value and never changes. I have used tensorflow
to implement my project.
I hadn't such problem before with CNNs so I think it's first time. I have checked the code it's based on tensorflow documentation so I don't think if the problem is with the code. Maybe I need to change some parameters but I am not sure.
Any idea about common solutions for such problem?
Update: I changed the optimizer from momentum to Adam whith default learning rate. For now validation error changes but it's lower than mini batch error most of the time while both have same batch sizes.
I have tested the model with and without biases with 0.1 as initial values but no good fit yet.
Update I fixed the issue I will update with more details soon.
One common solution that I found helpful for this type of problem is using TensorBoard. You can add details visualize training performance information after each epoch for different points in the computational graph. Adding key metrics is worth it since you can see how training progresses after applying changes in the adaptive learning rate, batch size, neural network architecture, drop out / regularization, number of GPUs, etc.
Here is the link that I found helpful to add these details: https://www.tensorflow.org/how_tos/graph_viz/#runtime_statistics