I recently completed the Cloud ML Criteo tutorial, and one of the final log messages from the distributed training job on the "small" dataset (~40M examples) was:
Saving dict for global step 7520: accuracy = 0.78864, ...
What does "global step" refer to here? I originally thought it was:
global step = (number of training examples * number of epochs) / batch size
However the training set size is 40.8M, the batch size is 30K, and the number of epochs is 5, so this doesn't lead to the right answer:
(40.8M x 5) / 30K = 6800
I think I understand this now. Even though the training set size is 40.8M examples, there is a line in the code that says it is 45M examples (I don't know why). And
(45M x 5) / 30K = 7500
which basically matches the log message.