Let us suppose that I have a neural network with multiple outputs and I'm dealing with a regression of V values. So, I have a last layer with V outputs.
If I use stochastic gradient descent, how the cost is calculated for updating the parameters of my network? For a given instance i, we calculate the mean squared error for each output (squared difference between the actual and the expected output divided by the number of instances) considering this instance, and we sum up these values for computing the cost for instance i?