I used Caffe for some time but now am using Tensorflow. Caffe has a hyperparameter 'iter_size', which accumulates gradients over iter_size
x batch_size
instances. iter_size
is used when GPU memory is limited and there is not enough GPUs.
I am wondering whether we can do the same operation in Tensorflow. I have seen this question. It accumulates the gradients but it does not reset the accumulated gradients to zero after applying gradients on variables.