Higher error with smaller batch size in pyTorch

14 views Asked by At

In stochastic gradient descent, in a single epoch, smaller batch sizes should give smaller errors since smaller batch size means more updates to the variables. However, in one of my recent experiments using pyTorch when I decrease the batch size (from 512 to 32) the loss after the first epoch increased. Is this possible? What does it say about the training process? Is gradient descent diverging? Or anything else?

0

There are 0 answers