Gradient descent stuck in local minima?

Question

Gradient descent stuck in local minima?

245 views Asked by Krellex At 15 November 2023 at 15:08

I'm running gradient descent to find a root for a system of nonlinear equations and I am wondering how you might detect if the method is stuck at the local minima, because I believe with the settings I am using this might be the case? my initial values are [-2, -1], tolerance of 10^-2 and 20 iterations. One thing I had read upon was that if the residual begins to flat line or begins to decrease incredibly slowly, it could be an indicator of the method being stuck in the local minima though, I am not entirely sure. I have graphed my residual with its iteration as the values of my iterates for each iteration and I'm wondering how I might know if it's stuck at the local minima.

def system(x):
    F = np.zeros((2,1), dtype=np.float64)
    F[0] = x[0]*x[0] + 2*x[1]*x[1] + math.sin(2*x[0])
    F[1] = x[0]*x[0] + math.cos(x[0]+5*x[1]) - 1.2
    return F
 
def jacb(x):
    J = np.zeros((2,2), dtype=np.float64)
    J[0,0] = 2*(x[0]+math.cos(2*x[0]))
    J[0,1] = 4*x[1]
    J[1,0] = 2*x[0]-math.sin(x[0]+5*x[1])
    J[1,1] = -5*math.sin(x[0]+5*x[1])
    return J


iterates, residuals = GradientDescent('system', 'jacb', np.array([[-2],[-1]]), 1e-2, 20, 0);

FullGradientDescent.py GradientDescentWithMomentum

I'm testing usually with 20 iterations but I did 200 to illustrate the slowing down of the residual

Marat suggested using GD with momentum. Code changes:

dn = 0
gamma = 0.8
dn_prev = 0
while (norm(F,2) > tol and n <= max_iterations):
    J = eval(jac)(x,2,fnon,F,*fnonargs)
    residuals.append(norm(F,2))
    
    dn = gamma * dn_prev+2*(np.matmul(np.transpose(J), F))
    dn_prev = dn

    lamb = 0.01       
    x = x - lamb * dn

Residual using GD with momentum

lastchance suggested doing a contour plot, this seems to show the behaviour of the algorithm but it still does not converge?

Original Q&A

There are 2 answers

**jlandercy** · Answer 1 · 2023-11-16T11:15:48+00:00

Rewrite your system as follow:

import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt

def system(x):
    return np.array([
        x[0]*x[0] + 2*x[1]*x[1] + np.sin(2*x[0]),
        x[0]*x[0] + np.cos(x[0] + 5*x[1]) - 1.2
    ])

Create some room to plot contour around minima:

xlin = np.linspace(-2, 2, 200)
ylin = np.linspace(-2, 2, 200)
X, Y = np.meshgrid(xlin, ylin)
Z = system([X, Y])

Solve the system around potential solution you have identified:

solutions = np.array([
    optimize.fsolve(system, x0=[0, 0]),
    optimize.fsolve(system, x0=[-1, 0]),
    optimize.fsolve(system, x0=[-1, 0.5])
])

# array([[ 0.        ,  0.        ],
#       [-0.96352546, -0.06643926],
#       [-0.84056293,  0.37906057]])

Confirm returned points actually are solutions:

system(solutions.T).T

# array([[ 0.00000000e+00, -2.00000000e-01],   # Not converged
#       [ 8.88178420e-16,  6.66133815e-16],
#       [-2.22044605e-15, -4.88498131e-15]])

Plot the whole thing to confirm solutions:

fig, axe = plt.subplots()
axe.contour(X, Y, np.log10(np.sqrt(Z[0]**2 + Z[1]**2)), 30, cmap="jet")
axe.plot(*solutions.T, linestyle="none", marker="o")
axe.grid()

The first point (0,0) is not a solution, solver fails:

optimize.fsolve(system, x0=[0, 0], full_output=True)
# (array([0., 0.]),
# {'nfev': 15,
#  'fjac': array([[-1.00000000e+00, -2.54340978e-08],
#         [ 2.54340978e-08, -1.00000000e+00]]),
#  'r': array([-3.12495738e+12,  2.10079410e+06, -5.44853621e-02]),
#  'qtf': array([5.08681955e-09, 2.00000000e-01]),
#  'fvec': array([ 0. , -0.2])},
# 5,
# 'The iteration is not making good progress, as measured by the improvement from the last ten iterations.')

Based on documentation:

The solution (or the result of the last iteration for an unsuccessful call).

Based on those information, you can confirm your code did find one of the two roots of your system over the domains, the additional critical point at (0,0) is not solution.

**lastchance** · Answer 2 · 2023-11-15T18:11:57+00:00

Your two equations can be written as curves y(x):

y=+-sqrt( (-sin(2x)-x^2) / 2 )

y = (arccos(1.2-x^2)-x)/5

These are the blue and red lines, respectively, on the graph below. Note that both have two branches. The points of intersection are the two roots. The roots can be found by multi-dimensional Newton-Raphson:

import numpy as np
from math import sin, cos

#=======================================================================

def solver( eqns, x ):
    MAXITER = 1000
    TOL = 1.0e-10
    iter, residuals = 1, 1.0
    while iter <= MAXITER:
        print( x )
        F = Function( eqns, x )
        residuals = np.linalg.norm( F )
        if residuals <= TOL: break
        J = Jacobian( eqns, x )
        try: 
            deltax = np.linalg.solve( J, F )
        except np.LinAlgError:
            print( "Unable to solve" )
            break
        x = x - deltax
        iter += 1
    return x, residuals <= TOL

#=======================================================================

def Function( eqns, x ):
    F = np.zeros_like( x )
    for i, f in enumerate( eqns ): F[i] = f( x )
    return F

#=======================================================================

def Jacobian( eqns, x ):
    SMALL = 1.0e-6
    N = len( x )
    J = np.zeros( ( N, N ) )
    for i in range( N ):
       for j in range( N ):
           xplus  = x.copy();   xplus [j] = x[j] + 0.5 * SMALL
           xminus = x.copy();   xminus[j] = x[j] - 0.5 * SMALL
           J[i,j] = ( eqns[i]( xplus ) - eqns[i]( xminus ) ) / SMALL
    return J

#=======================================================================

eqns = [ lambda x: x[0] ** 2 + 2 * x[1] ** 2 + sin( 2 * x[0] ),
         lambda x: x[0] ** 2 + cos( x[0] + 5 * x[1] ) - 1.2   ]
x = np.array( [-0.8, 0.5 ] )
#x = np.array( [-0.8, -0.1 ] )          # alternative root
x, result = solver( eqns, x )
if result:
    print( "Root: ", x )
else:
    print( "No root found" )

#=======================================================================

For the first starting point you get:

[-0.8  0.5]
[-0.85082789  0.38764034]
[-0.8407059   0.37917738]
[-0.84056296  0.37906059]
[-0.84056293  0.37906057]
Root:  [-0.84056293  0.37906057]

For the second starting point you get:

[-0.8 -0.1]
[-1.01262867 -0.06737605]
[-0.96586428 -0.0665185 ]
[-0.96353139 -0.06643953]
[-0.96352546 -0.06643926]
[-0.96352546 -0.06643926]
Root:  [-0.96352546 -0.06643926]

TechQA.

Gradient descent stuck in local minima?

There are 2 answers

Related Questions in PYTHON

Related Questions in NUMERICAL-METHODS

Related Questions in GRADIENT-DESCENT

Related Questions in ROOT-FINDING

Popular Questions

Popular Tags

Trending Questions