Lets say that I have a neural network named 'NN' with 500 weights and biases (total parameters=500).
For one training sample: It's introduced through 'NN', it spits out an output (Out1), the output is compared to the training label, and with the backpropagation algorithm there is a small change(positive or negative) in every parameter of 'NN'. The cost function is represented by a vector of dimentions 1x500, with all the small modifications obtained by the backpropagation algorithm.
Lets say mini_batch_size=10
For one mini-batch: each and every one of the 10 training samples provide a cost function of dimensions 1x500.
In order to visualize and explain better, lets say that we create a matrix of 10x500 (called M), where every row is the cost function of every training sample.
Question: For the mini-batch training example, Is the final cost function of the minibatch the result of the average of all the column elements?
PD. In case the question is not clear enough I left some code on what I exactly mean.
for j=1:500
Cost_mini_batch(j)=sum(M(:,j))/10
end
The dimensions of Cost_mini_batch are 1x500.
"Cost" refers to the loss, i.e. the error between Out1 and the training label.
This is called "gradient", not cost function.
Yes, both gradient and cost function for a minibatch is the average of the gradients of each example in the minibatch.