I am trying to use Hinge loss with densenet on the CIFAR 100 dataset. The learning converges to some point and after that there is no learning. The accuracy is much less than Densenet with CrossEntropy loss function. I tried with different learning rates and weight decays.
Any ideas on why I am unable to train properly Densenet with Hinge loss? I am able to use Hinge loss with Resnet without any problem.