Since I have a large dataset and not much power in my PC, I thought it was a good idea to use TPU on Google Colab.
So, here is my TPU configuration :
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
print('Running on TPU ', tpu.master())
except ValueError:
tpu = None
if tpu:
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
strategy = tf.distribute.get_strategy()
print("REPLICAS: ", strategy.num_replicas_in_sync)
And here is my training :
hist = model.fit(train_dataset, epochs=10, verbose=1, steps_per_epoch=count_data_items(filenames)//64)
It is not enough to create a strategy. You should use this strategy correctly.
You probably have to tune your pipeline, increase batch size, etc.
Have a look here: https://cloud.google.com/tpu/docs/performance-guide
Another important point is that
TPU
has a warm-up period — it spends a lot of time building a computation graph during the first calls (every call with a new input shape).