tcmalloc: large allog error using Google Colab

1.3k views Asked by At

I'm using the following tutorial to build a neural language model on the Google Colab platform: https://machinelearningmastery.com/how-to-develop-a-word-level-neural-language-model-in-keras/.

My dataset, which contains 2036456 sequences and a vocabulary of 77069 words is significantly larger than the one used in the tutorial, but I can still train the model on my local computer. However, when I try to train with the full dataset on Google Colab using GPU, I get the following error:

tcmalloc: large alloc 627790512128 bytes == 0x4789c000 @ 0x7fc0aaaf8001 0x7fc0a861c765 0x7fc0a8680bb0 ...

I've managed to figure out where in the code this error is being raised, and it looks like the culprit is the following line, where the keras.to_categorical() function is called to one-hot encode output words:

y = to_categorical(y, num_classes=vocab_size)

This tells me that something breaks when one-hot encoding the (rather large) vocabulary, but I don't understand why this is not an issue on my local machine.

In other threads, I've read that this might be a memory issue and it does indeed seem like the Google GPU is running out of memory (I have about 12 GBs of RAM allocated). Again, this is not an problem on my local computer, a simple MacBook Pro with 16 GBs of RAM, with which I can load the data and train the model (although it is painfully slow). During training, with the full dataset loaded, the process takes up about 13 GBs of RAM, which is not drastically different from the amount that 10% of the same dataset requires on the cloud GPU server.

What am I doing wrong? And how is it possible that my local machine can deal with the data but Google Colab cannot? Or perhaps, is the issue completely unrelated to the data?

Thanks, A.

0

There are 0 answers