A few days ago I finished writing a word prediction program that tests both LSTM and GRU models on a given dataset. I test 4 models - 2 LSTM models and 2 GRU models. I wrote the program on Google Colab.
I use two validation sets to see how it affects their perplexity. I left it quite satisfied with the results I've gotten. Now, a few days later, when I came back to run it I see that I randomly get extremely huge perplexities on the first epoch for at least one of the validation sets, and when I do, those last for all 4 models. However, when I stop the program after the first epoch and run it again immediately, it sometimes fixes the perplexity issues on the first epoch (and consequently on the entire program). This can be done again and again until all 3 datasets give normal results.
Could there be any google colab issue that's causing these random flops in the runs? I'm asking in case I can prevent digging into my code for no reason, as some times it just works perfectly fine.
Thanks!
EDIT: After testing it for quite a few times I noticed the following ALWAYS happens on the first epoch after I restart runtime:
- Training perplexity is normal (as always), both validation sets' is not. I stop and Run All again.
- Training perplexity is normal (as always), validation set 1 is normal, validation set 2 is not. I stop and Run All again.
- Training perplexity is normal (as always), both validation sets' are normal. I stop and Run All again.
- same as 3.