Wouldn't setting the first derivative of Cost function J to 0 gives the exact Theta values that minimize the cost?

710 views Asked by At

I am currently doing Andrew NG's ML course. From my calculus knowledge, the first derivative test of a function gives critical points if there are any. And considering the convex nature of Linear / Logistic Regression cost function, it is a given that there will be a global / local optima. If that is the case, rather than going a long route of taking a miniscule baby step at a time to reach the global minimum, why don't we use the first derivative test to get the values of Theta that minimize the cost function J in a single attempt , and have a happy ending?

That being said, I do know that there is a Gradient Descent alternative called Normal Equation that does just that in one successful step unlike the former.

On a second thought, I am thinking if it is mainly because of multiple unknown variables involved in the equation (which is why the Partial Derivative comes into play?) .

1

There are 1 answers

0
Shivendra On

Let's take an example:

Gradient simple regression cost function:

Δ[RSS(w)  = [(y-Hw)T(y-Hw)]
y  : output 
H  : feature vector
w  : weights
RSS: residual sum of squares

Equating this to 0 for getting the closed form solution will give:

w = (H T H)-1 HT y

Now assuming there are D features, the time complexity for calculating transpose of matrix is around O(D3). If there are a million features, it is computationally impossible to do within reasonable amount of time.

We use these gradient descent methods since they give solutions with reasonably acceptable solutions within much less time.