I would like to know for which algorithmic steps during machine learning TPUs are generally used in the current state of the art. In particular, it would be interesting to me whether they are used for inference, for backpropagation and/or for convolutions.
I know how a systolic array works and the basic principle of a TPU and it makes sense that they can do non-sparse matrix multiplications much faster than a CPU/GPU. But for example for convolutions the multplied matrices are generally very sparse. Does it still make sense to use TPUs there?
I would appreciate a thorough explanation regarding this topic.