I'm trying to implement a very naive gradient descent in python. However, it looks like it goes into an infinite loop. Could you please help me debug it?
y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
optimum = initial_guess
while derivative(optimum) != 0:
optimum = optimum - derivative(optimum)
else:
return optimum
gradient_descent(y,dy_dx,5)
Edit:
Now I have this code, I really can't comprehend the output. P.s. It might freeze your CPU.
y = lambda x : x**2
dy_dx = lambda x : 2*x
def gradient_descent(function,derivative,initial_guess):
optimum = initial_guess
while abs(derivative(optimum)) > 0.01:
optimum = optimum - 2*derivative(optimum)
print((optimum,derivative(optimum)))
else:
return optimum
gradient_descent(y,dy_dx,5)
Now I'm trying to apply it to a regression problem, however the output doesn't appear to be correct as shown in the output below:
Output of gradient descent code below
import matplotlib.pyplot as plt
def stepGradient(x,y, step):
b_current = 0
m_current = 0
b_gradient = 0
m_gradient = 0
N = int(len(x))
for i in range(0, N):
b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
while abs(b_gradient) > 0.01 and abs(m_gradient) > 0.01:
b_current = b_current - (step * b_gradient)
m_current = m_current - (step * m_gradient)
for i in range(0, N):
b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
return [b_current, m_current]
x = [1,2, 2,3,4,5,7,8]
y = [1.5,3,1,3,2,5,6,7]
step = 0.00001
(b,m) = stepGradient(x,y,step)
plt.scatter(x,y)
abline_values = [m * i + b for i in x]
plt.plot(x, abline_values, 'b')
plt.show()
Fixed :D
import matplotlib.pyplot as plt
def stepGradient(x,y):
step = 0.001
b_current = 0
m_current = 0
b_gradient = 0
m_gradient = 0
N = int(len(x))
for i in range(0, N):
b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
while abs(b_gradient) > 0.01 or abs(m_gradient) > 0.01:
b_current = b_current - (step * b_gradient)
m_current = m_current - (step * m_gradient)
b_gradient= 0
m_gradient = 0
for i in range(0, N):
b_gradient += -(1/N) * (y[i] - ((m_current*x[i]) + b_current))
m_gradient += -(1/N) * x[i] * (y[i] - ((m_current * x[i]) + b_current))
return [b_current, m_current]
x = [1,2, 2,3,4,5,7,8,10]
y = [1.5,3,1,3,2,5,6,7,20]
(b,m) = stepGradient(x,y)
plt.scatter(x,y)
abline_values = [m * i + b for i in x]
plt.plot(x, abline_values, 'b')
plt.show()
Your
while
loop stops only when a calculated floating-point value equals zero. This is naïve, since floating-point values are rarely calculated exactly. Instead, stop the loop when the calculated value is close enough to zero. Use something likewhere
eps
is the desired precision of the calculated value. This could be made another parameter, perhaps with a default value of1e-10
or some such.That said, the problem in your case is worse. Your algorithm is far too naïve in assuming that the calculation
will move the value of
optimum
closer to the actual optimum value. In your particular case, the variableoptimum
just cycles back and forth between5
(your initial guess) and-5
. Note that the derivative at5
is10
and the derivative at-5
is-10
.So you need to avoid such cycling. You could multiply your delta
2*derivative(optimum)
by something smaller than1
, which would work in your particular casey=x**2
. But this will not work in general.To be completely safe, 'bracket' your optimum point with a smaller value and a larger value, and use the derivative to find the next guess. But ensure that your next guess does not go outside the bracketed interval. If it does, or if the convergence of your guesses is too slow, use another method such as bisection or golden mean search.
Of course, this means your 'very naïve gradient descent' algorithm is too naïve to work in general. That's why real optimization routines are more complicated.