Derivation of the Backpropagation Algorithm for Neural Networks

266 views Asked by At

Perhaps this is a dumb question, but this doubt is really prohibiting me from understanding Backpropagation. So I was reading and trying to understand the Backpropagation Wikipedia article. It states that the discrepancy is E=(t-y)^2, and then:

However, the output of a neuron depends on the weighted sum of all its inputs:

y=x1w1 + x2w2

Shouldn't y=phi(x1w1 + x2w2) ?

And if y = phi(x1w1 + x2w2) isn't the plot of the discrepancy vs the weights supposed to be kind of a step function with one segment of weights returning the minimum and the rest not (because some combination of weights returns 0 and the other 1, and only one of these answers is correct)

1

There are 1 answers

6
Gabriel On BEST ANSWER

Ok, I understand why you thought that, but 'y' is the input sum and the output only depends on it, if you wanna find the output, it's very simple, you just need apply it by the activation function phi, in this case I think we should use the phi because logistic function (the sigmoid curve) make things easier to understand when we plot a graphic that represent something that changes over time.

So let's take a look at the fuction you are talking about, y=phi(x1w1 + x2w2), we know that phi=1/(1+e^(-z)), so we can mix both equations to find the output(o): o = 1/(1+e^(-(x1*w1+x2*w2))).

Perfect, now if you wanna discover if this is a step function, we can apply the some calculus and use the theorem of continuity.

The activation fuction is differentiable (it is a continues fuction) to make sure you can find the partial derivative of the error, if you need to. And knowing this, we can say that because the phi is continues and (x1*w1+x2*w2) is a polynomial function (also continues) that our final function 'o' it is a continues function.