Implementing naive gradient descent in python

509 views Asked by At

I'm trying to implement a very naive gradient descent in python. However, it looks like it goes into an infinite loop. Could you please help me debug it?

y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
    optimum = initial_guess
    while derivative(optimum) != 0:
        optimum = optimum - derivative(optimum)
    else:
        return optimum
gradient_descent(y,dy_dx,5)

Edit:

Now I have this code, I really can't comprehend the output. P.s. It might freeze your CPU.

y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
    optimum = initial_guess
    while abs(derivative(optimum)) > 0.01:
        optimum = optimum - 2*derivative(optimum)
        print((optimum,derivative(optimum)))
    else:
        return optimum
gradient_descent(y,dy_dx,5) 

Now I'm trying to apply it to a regression problem, however the output doesn't appear to be correct as shown in the output below:

Output of gradient descent code below

import matplotlib.pyplot as plt
def stepGradient(x,y, step):
    b_current = 0 
    m_current = 0
    b_gradient = 0
    m_gradient = 0
    N = int(len(x))   
    for i in range(0, N):
        b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
        m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
    while abs(b_gradient) > 0.01 and abs(m_gradient) > 0.01:
        b_current = b_current - (step * b_gradient)
        m_current = m_current - (step * m_gradient)
        for i in range(0, N):
            b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
            m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
    return [b_current, m_current]

x = [1,2, 2,3,4,5,7,8]
y = [1.5,3,1,3,2,5,6,7]
step = 0.00001
(b,m) = stepGradient(x,y,step)


plt.scatter(x,y)
abline_values = [m * i + b for i in x]
plt.plot(x, abline_values, 'b')
plt.show()

Fixed :D

import matplotlib.pyplot as plt
def stepGradient(x,y):
    step = 0.001
    b_current = 0 
    m_current = 0
    b_gradient = 0
    m_gradient = 0
    N = int(len(x))   
    for i in range(0, N):
        b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
        m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
    while abs(b_gradient) > 0.01 or abs(m_gradient) > 0.01:
        b_current = b_current - (step * b_gradient)
        m_current = m_current - (step * m_gradient)
        b_gradient= 0
        m_gradient = 0
        for i in range(0, N):
            b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
            m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
    return [b_current, m_current]

x = [1,2, 2,3,4,5,7,8,10]
y = [1.5,3,1,3,2,5,6,7,20]
(b,m) = stepGradient(x,y)


plt.scatter(x,y)
abline_values = [m * i + b for i in x]
plt.plot(x, abline_values, 'b')
plt.show()
2

There are 2 answers

2
Rory Daulton On

Your while loop stops only when a calculated floating-point value equals zero. This is naïve, since floating-point values are rarely calculated exactly. Instead, stop the loop when the calculated value is close enough to zero. Use something like

while math.abs(derivative(optimum)) > eps:

where eps is the desired precision of the calculated value. This could be made another parameter, perhaps with a default value of 1e-10 or some such.


That said, the problem in your case is worse. Your algorithm is far too naïve in assuming that the calculation

optimum = optimum - 2*derivative(optimum)

will move the value of optimum closer to the actual optimum value. In your particular case, the variable optimum just cycles back and forth between 5 (your initial guess) and -5. Note that the derivative at 5 is 10 and the derivative at -5 is -10.

So you need to avoid such cycling. You could multiply your delta 2*derivative(optimum) by something smaller than 1, which would work in your particular case y=x**2. But this will not work in general.

To be completely safe, 'bracket' your optimum point with a smaller value and a larger value, and use the derivative to find the next guess. But ensure that your next guess does not go outside the bracketed interval. If it does, or if the convergence of your guesses is too slow, use another method such as bisection or golden mean search.

Of course, this means your 'very naïve gradient descent' algorithm is too naïve to work in general. That's why real optimization routines are more complicated.

1
rofls On

You also need to decrease your step size (gamma in the gradient descent formula):

y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
    optimum = initial_guess
    while abs(derivative(optimum)) > 0.01:
        optimum = optimum - 0.01*derivative(optimum)
        print((optimum,derivative(optimum)))
    else:
        return optimum