I am training VGG11
on a custom image dataset for 3-way 5-shot image classification using MAML
from learn2learn
. I am encapsulating the whole VGG11
model with MAML
, i.e., not just the classification head. My hyperparameters are as follows:
- Meta LR: 0.001
- Fast LR: 0.5
- Adaptation steps: 1
- First order: False
- Meta Batch Size: 5
- Optimizer: AdamW
During the training, I noticed that after taking the first outer-loop optimization step, i.e., AdamW.step()
, loss skyrockets to very large values, like ten thousands. Is this normal? Also, I am measuring the micro F1 score as accuracy metric of which curve for meta training/validation is as follows:
It is fluctuating too much in my opinion, is this normal? What could be the reason of this? Thanks
I figured it out. I was using
VGG11
with vanillaBatchNorm
layers from PyTorch which was not working properly in meta training setup. I removedBatchNorm
layers and now it works as expected.