Why is Google Colab TPU as slow as my computer?

2.4k views Asked by At

Since I have a large dataset and not much power in my PC, I thought it was a good idea to use TPU on Google Colab.

So, here is my TPU configuration :

try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    print('Running on TPU ', tpu.master())
except ValueError:
    tpu = None

if tpu:
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
    strategy = tf.distribute.get_strategy()
    
print("REPLICAS: ", strategy.num_replicas_in_sync) 

And here is my training :

hist = model.fit(train_dataset, epochs=10, verbose=1, steps_per_epoch=count_data_items(filenames)//64)
2

There are 2 answers

1
Andrey On

It is not enough to create a strategy. You should use this strategy correctly.

You probably have to tune your pipeline, increase batch size, etc.

Have a look here: https://cloud.google.com/tpu/docs/performance-guide

Another important point is that TPU has a warm-up period — it spends a lot of time building a computation graph during the first calls (every call with a new input shape).

0
Praneetha R On

The number of TPU core available for the Colab notebooks is 8 currently. Takeaways: From observing the training time, it can be seen that the TPU takes considerably more training time than the GPU when the batch size is small. But when batch size increases the TPU performance is comparable to that of the GPU.go through this link for more details