CUDA out of memory when running Bert with Pytorch (Previously worked)

513 views Asked by At

I am building a BERT binary classification on SageMaker using Pytorch.

Previously when I ran the model, I set the Batch size to 16 and the model were able to run successfully. However, yesterday after I stopped SageMaker and restarted the this morning, I can't run the model with Batch size as 16 any more. I am able to run the model with batch size 8.
However, the model is not producing the same result (of course). I didn't change anything else in between. All other settings are the same. (Except I change the SageMaker volume from 30GB to 200GB.)

Does anyone know what may cause this problem? I really want to reproduce the result with batch size 16.

Any answers will help and thank you in advance!

0

There are 0 answers