Why the scale became zero when using torch.cuda.amp.GradScaler?

1.5k views Asked by cuistiano At 26 July 2020 at 10:28

I use the following snippet of code to show the scale when using Pytorch's Automatic Mixed Precision Package(amp):

scaler = torch.cuda.amp.GradScaler(init_scale = 65536.0,growth_interval=1)
print(scaler.get_scale())

and This is the output that I get:

...
65536.0
32768.0
16384.0
8192.0
4096.0
...
1e-xxx
...
0
0
0

And all the loss after this step became Nan (the scale still is 0 in the meanwhile).
Whats wrong with my loss function or training data?

TechQA.