Basic Machine Learning: Linear Regression and Gradient Descent

404 views Asked by At

I'm taking Andrew Ng's ML class on Coursera and am a bit confused on gradient descent. The screenshot of the formula I'm confused by is here:

Formula in question

In his second formula, why does he multiply by the value of the ith training example? I thought when you updated you were just subtracting the step size * the cost function (which shouldn't include the ith training example.

What am I missing? It doesn't make much sense to me, especially since the ith training example is a series of values, not just one...

Thanks, bclayman

1

There are 1 answers

0
div On BEST ANSWER

Mathematically, we are trying here to minimise error function

Error(θ) = Σ(yi - h(xi))^2    summation over i.

To minimise error, we do

d(Error(θ))/dθi = Zero
putting h(xi) = Σ(θi*xi)     summation over i

and derive the above formula.

The rest of the formulation can be reasoned as

Gradient descent uses the slope of the function itself to find the maxima. Think it as coming downhill in a valley by taking direction such that downward slope is minimum. So, we get the direction but what should be the step size(how long should we continue to move in the same direction?)?

For that also we use the slope. Since at minima slope is zero.(Just think of bottom of a valley since all its nearby points are higher than this. So, there must be this one point where height was reducing, slope was negative and height started increasing, slope changed sign, became negative to positive and in between the minima was point of zero slope.) To reach 0 slope, magnitude of slope decreases towards the minima. So, if magnitude of slope is high, we can take large steps, and if it's low we are closing in on minima and should take small steps.