I would like to know what exactly tape.gradient does in eager mode when batch is passed to it. Does it aggregate all gradients? and if so, does sum them up or average them or what?
Thanks
I would like to know what exactly tape.gradient does in eager mode when batch is passed to it. Does it aggregate all gradients? and if so, does sum them up or average them or what?
Thanks