solving Freudenstein and Roth test function using gradient armijo method

129 views Asked by At

I try to solve the Freudenstein and Roth test function which is given by:

f1^2 + f2^2, where f1, f2 are given by:

f1 (x, y) = −13 + x + ((5 − y) y − 2) y,
f2 (x, y) = −29 + x + ((y + 1) y − 14) y.

with the parameters alpha, beta set to 0.5, and t=1.

So far I wrote the next python code:

import numpy as np

x = np.array([20,-18])

# Define the functions and gradient using lambda functions

f1 = lambda x: 5 * x[1]**2 - x[1]**3 - 2 * x[1] + x[0] - 13
f2 = lambda x: x[0] + x[1]**3 + x[1]**2 - 14 * x[1] - 29
f = lambda x: f1(x)**2 + f2(x)**2
grad_f = lambda x: np.array((f1(x)+f2(x) , f1(x)*(10*x[1]-3*x[1]**2-2)+f2(x)*(3*x[1]**2+2*x[1]-14)))

# Define parameters for Armijo backtracking line search

alpha = 0.5
beta = 0.5
d = grad_f(x)
while np.linalg.norm(grad_f(x)) > 1e-2:
    t = 1.0
# The sufficient decreasing condition for Armijo
while f(x) - f(x - t * d)  <  alpha * t * np.dot(grad_f(x), d):
    t = t*beta
    x = x - t *d
    print(x)

print("Optimal solution:", x)

Can anyone help me understand what might be wrong with my approach?

So far, I tried to play with the initial points, even I gave a point which is close to stationary point but it returns the same value all the time.

1

There are 1 answers

3
Tino D On

I fixed some of the stuff in your code, mainly I defined the grad of f1 and f2 for better readability and easier implementing. Further more, I hope that the indentation in your question is a mistake, so I used the correct indentation in my answer. I am also not sure why you used d instead of just grad_f in your code.

Here is my code:

import numpy as np
%matplotlib notebook
import matplotlib.pyplot as plt

x = np.array([20, -18]) # define range of variable x
f1 = lambda x: 5 * x[1]**2 - x[1]**3 - 2 * x[1] + x[0] - 13  # define f1
f2 = lambda x: x[0] + x[1]**3 + x[1]**2 - 14 * x[1] - 29  # define f2
f = lambda x: f1(x)**2 + f2(x)**2  # define f as a combination of f1 and f2
gradF1 = lambda x: np.array([1, 10 * x[1] - 3 * x[1]**2 - 2])  # gradient of f1
gradF2 = lambda x: np.array([1, 3 * x[1]**2 + 2 * x[1] - 14])  # gradient of f2
gradF = lambda x: 2 * np.array([f1(x), f2(x)]) @ np.array([gradF1(x), gradF2(x)]) # gradient of the combined function f with respect to x

# parameters for armijo
alpha = 0.5
beta = 0.5

iterations = []  # list to store iteration number
objectiveValues = []  # list to store objective values

# think also about breaking after a certain amount of iterations are done...
while np.linalg.norm(gradF(x)) > 1e-2: # condition to stop based on the gradiant of f
    t = 1.0 # init t
    while f(x) - f(x - t * gradF(x)) < alpha * t * np.dot(gradF(x), gradF(x)): # sufficient decreasing condition for Armijo
        t = t * beta
    x = x - t * gradF(x) # update x
    iterations.append(len(iterations) + 1) # store value...
    objectiveValues.append(f(x)) # ==//==
    print("Iteration:", str(len(iterations)).zfill(5), "\tx:", x, "\tf(x):", f(x)) # print values of each iteration

# this is for plotting the overall way it was solved
plt.plot(iterations, objectiveValues, marker='o')
plt.xlabel('Iteration')
plt.ylabel('Objcetive function value')
plt.grid()
plt.yscale("log")
plt.show()
print("Optimal solution:", x)

The result of the approach as a plot:

objective function vs iterations

And the optimal solution:

Optimal solution: [11.42461124 -0.89610085]

V2.0: better computation by only computing gradF once, as nicely noted by Axel Kemper:

The approach remains largely the same, with the exception of some lines when it comes to assigning gradF to d:

import numpy as np
%matplotlib notebook
import matplotlib.pyplot as plt

x = np.array([20, -18]) # define range of variable x
f1 = lambda x: 5 * x[1]**2 - x[1]**3 - 2 * x[1] + x[0] - 13  # define f1
f2 = lambda x: x[0] + x[1]**3 + x[1]**2 - 14 * x[1] - 29  # define f2
f = lambda x: f1(x)**2 + f2(x)**2  # define f as a combination of f1 and f2
gradF1 = lambda x: np.array([1, 10 * x[1] - 3 * x[1]**2 - 2])  # gradient of f1
gradF2 = lambda x: np.array([1, 3 * x[1]**2 + 2 * x[1] - 14])  # gradient of f2
gradF = lambda x: 2 * np.array([f1(x), f2(x)]) @ np.array([gradF1(x), gradF2(x)]) # gradient of the combined function f with respect to x

# parameters for armijo
alpha = 0.5
beta = 0.5

iterations = []  # list to store iteration number
objectiveValues = []  # list to store objective values
d = gradF(x)
# think also about breaking after a certain amount of iterations are done...
while np.linalg.norm(d) > 1e-2: # condition to stop based on the gradiant of f
    t = 1.0 # init t
    while f(x) - f(x - t * d) < alpha * t * np.dot(d, d): # sufficient decreasing condition for Armijo
        t = t * beta
    x = x - t * d # update x
    iterations.append(len(iterations) + 1) # store value...
    objectiveValues.append(f(x)) # ==//==
    # print("Iteration:", str(len(iterations)).zfill(5), "\tx:", x, "\tf(x):", f(x)) # print values of each iteration
    d = gradF(x)
    
# this is for plotting the overall way it was solved
plt.plot(iterations, objectiveValues, marker='o')
plt.xlabel('Iteration')
plt.ylabel('Objcetive function value')
plt.grid()
plt.yscale("log")
plt.show()
print("Optimal solution:", x)

The results and the plot remain the same, the approach reduces the amount of time taken to compute gradF, and hence is a much nicer approach.