I am trying to make my very first Neural Network work. I designed it so that I can choose the number of layers and the number of nodes per layer freely. I had a hard time trying to implement back propagation but I think I have done it recursively even if it is not as performant as it can be. I am using the sigmoid as an activation for all nodes (even the input nodes and the output node).
My network has a single output node in the output layer that should predict a variable (zero or one).
My question is how exactly should I do to train my network ? I noticed that when I use the following algorithm:
for i in [1:100000]
- feed the same record to my neural network
- perform a forward pass
- compute the error using the square of the difference as a loss function for this record with the current weights
- Update the weights using back propagation
it converges to the correct result (the output node value converges to zero when the record is labeled with zero, and to one when the record is labeled as one). But when I feed a different record to the network at each time of this iterative algorithm the network completely diverges.
Suppose that I would like to work with a mini batch of N records, this means that I have to make N forward passes giving at each time one of the N records as input to the network, comùpute the error, take the average over the N records, but then, when I would like to use the average error in the back propagation algorithm, what input record should I use ? Because, as far as I know the input layer is also used to compute the weights between it and the first hidden layer. Should I then use the last one of the N records as input? Or the first one ? Does it even matter ? I am a bit confused here and I found nothing on the internet to answer this particular question.
Best regards.