Writing Objective Function using scipy.optimize.minimize Troubleshoot

102 views Asked by At

I am trying to solve an optimization problem applied to bridges, in the context of a parametric design. The objective function is to try and split precast barriers into a minimum amount of unique barriers. There are two kind of barriers, normal and barriers with light poles. I have shared an image with a bridge of 3 spans.

The parameters are as follows:

The light pole barrier width is equal to w.

The light pole barriers are evenly spaced by the parameter s.

The light pole starting point is defined by the variable a.

The spans of the bridge, [l1,l2,...ln]

x is the width that all of the equations will have to be divisible by.

The idea is to as many barriers with the width of x and some remaining lengths to fill the spans kept at a minimum. I have to get all the distances [a,b,c,...f] in an array of equations (see picture), where I am also appending the light between the special barriers (s-w), and have all of them divisible by x. I also added in the elements arr array to account for constants in case the remaining length is not divisible by x so we will compensate with a different barrier (basically, we use that constant to subtract).

Here is a sketch of what I am talking about: bridge

from scipy.optimize import minimize
import numpy as np

s = 12500  
spans = np.array([25450, 25100, 25450])
ejs = np.zeros(len(spans)-1)  # expansion or separation joints . ignore for now

min_bw, max_bw  = 0, 1200  #light pole barrier width

consts = abs((len(spans) * 2 - 1))  # Number of distance elements [b,c,d...]
init = np.array([1500, 1500, 6250]) #x, w, a + w/2

n = (spans - s / 2) // s  # consecutive spacings of barriers for a given span. At times we have more 
m = np.ones(len(spans) * 2)  # the length of the equation array that has to be divisible

def objective(arr, n, s, spans, ejs):

    x = arr[0]
    w = arr[1]
    a =  arr[2] # (the offset is until the mid axis of the light pole)
    m = np.ones(len(spans) * 2 + 1) # We make room for the last constraint h=(s-w)
    m[0] = a - w / 2 

    for i in range(1, len(spans) * 2):
        j = (i - 1) // 2
        if i % 2 == 0:
            m[i] = (s - w) - (m[i - 1] + ejs[j] + arr[2 + i])
        else:
            m[i] = spans[j] - (m[i - 1] + n[j] * s + w + arr[2 + i])
    m[-1] = s - w  # We also need the light distance between two poles to be divisible by x
    #print(m)
    return np.sum(m%x)

#Bounds
bounds = np.array([(1500, 2100), (1500, 2600), (3000, s)])
extras = np.repeat([(min_bw, max_bw)], consts, axis=0) # Add the bounds for each additional element
all_bounds = np.concatenate((bounds, extras), axis=0)


initial = np.pad(init, (0, consts), 'constant', constant_values=min_bw)

sol = minimize(fun=objective, x0=initial, bounds=all_bounds, method='Nelder-Mead', args=(n, s, spans, ejs))
print(np.round(sol.x))

#m = [ 5500. , 5950.,  5050.  ,6050.  ,4950. , 6500. ,11000.] #the print m statment

The problem that I can't seem to figure out is that the np.sum(m%x) does not return 0! I get the right distances in the m array compared to the drawing. The divisibility equation constraints are not satisfied yet the function says it terminates successfully. I'd appreciate any insights! Thank you so much!

2

There are 2 answers

15
Reinderien On BEST ANSWER

The main problems that Nick calls out are true (though not comprehensive): there are many local minima, and the objective is non-differentiable.

But just as importantly: your objective isn't well-suited as an objective at all - it should be a constraint; and this entire system (when re-constructed) is linear. That means it's time for integer linear programming, which has better guarantees of convergence and accuracy than a generic minimize (no matter which method you select). More specifically, you're doing constraint programming, where there is no objective at all. (Or maybe you have some idea of what a better objective should be, such as minimizing the coefficients of x).

In addition to the variables for x, w, a, your "distance elements" and m, you also need a binary selection matrix of which multiple of x is equal to m. This gets around the problem that multiplication of x by some multiple to get m is non-linear, by using so-called big-M constraints.

I've tried to make this easy-to-follow, and at any point you can call some_constraint.A.toarray() and view the result in e.g. PyCharm's matrix visualiser to see what's happening; but you may also need to dig into some introductory material on linear programming to get a better handle on it. Anyway, the results seem sane.

import numpy as np
import scipy.sparse as sp
from scipy.optimize import milp, Bounds, LinearConstraint

s = 12_500
spans = np.array((25_450, 25_100, 25_450))
ejs = np.zeros(len(spans) - 1)  # expansion or separation joints
min_bw, max_bw = 0, 1200  # light pole barrier width
consts = spans.size*2 - 1  # Number of distance elements [b,c,d...]
m_count = spans.size*2 + 1
n = (spans - s/2)//s  # consecutive spacings of barriers for a given span

# In the old solution's objective for the initial guess, m/x ranges from 3.6 to 7.3. Ballpark 20 as a maximum.
mul_count = 20
multiples = np.arange(1, 1 + mul_count)[:, np.newaxis]

'''
Variables: x, w, a, consts, m, and then
a m_count * mul_count matrix of selectors of integer multiples of m.
Each selector is binary (0 or 1), and indicates times 1, times 2, ... times mul_count respectively.
'''

# Constraint matrix for all m equations, starting with the 'm' term itself
m_A = sp.hstack((
    sp.csc_array((m_count, 3 + consts)),
    -sp.eye(m_count),
    sp.csc_array((m_count, m_count*mul_count)),
), format='lil')

# Column positions: x, w, a, consts, m, mi
# First row: a - w/2 == m0: -0.5w + a - m == 0
m_A[0, 1:3] = -0.5, 1
# Middle rows
m_A[1:, 1] = -1  # -w (middle and last row)
m_A[1:-1, 3: 3+consts] = -sp.eye(consts)  # -consts
m_A[1:-1, 3+consts: 3+consts+m_count - 2] -= sp.eye(m_count - 2)  # -m[i - 1]

m_rhs = np.empty(m_count)
# First row
m_rhs[0] = 0
# Even rows: m[i] = s - w - m[i-1] - ejs[(i - 1)//2] - consts
#            -s + ejs[(i - 1)//2] = -w - m[i] - m[i-1] - consts
m_rhs[2:-1:2] = ejs - s
# Odd rows: m[i] = spans[(i - 1)//2] - (m[i - 1] + n[(i - 1)//2] * s + w + arr[2 + i])
#            -spans[(i - 1)//2] + n[(i - 1)//2]*s = -m[i] - m[i - 1] - w - consts
m_rhs[1:-1:2] = n*s - spans
# Last row: m[-1] = s - w
#           -w - m[-1] = -s
m_rhs[-1] = -s

constraint_m = LinearConstraint(A=m_A, lb=m_rhs, ub=m_rhs)
# To see the constraint matrix: view constraint_m.A.toarray() in a matrix viewer

# Exactly one selector per m-variable must be on
constraint_selection = LinearConstraint(
    A=sp.hstack((
        sp.csc_array((m_count, 3 + consts + m_count)),
        sp.kron(
            A=sp.eye(m_count),
            B=np.ones(mul_count),
            format='csc',
        )
    )),
    lb=np.ones(m_count),
    ub=np.ones(m_count),
)

'''
Enforce the multiple constraint:
m >= x*i                     m <= x*i       (if mi is selected)
-i*x + m >= 0                i*x - m >= 0   (if mi is selected)
-i*x + m + M(1 - mi) >= 0    i*x - m + M(1 - mi) >= 0
-i*x + m - M*mi >= -M        i*x - m - M*mi >= -M
Column positions: x, w, a, consts, m, mi
'''
ix_m = sp.hstack((
    np.tile(multiples, (m_count, 1)),
    sp.csc_array((m_count * mul_count, 2 + consts)),
    sp.kron(
        -sp.eye(m_count),
        np.ones((mul_count, 1)),
    ),
))
M = 1e6  # big-M constraint coefficient
mmi = -M*sp.eye(m_count * mul_count)
constraint_mul = LinearConstraint(
    A=sp.bmat((
        ( ix_m, mmi),
        (-ix_m, mmi),
    )),
    lb=np.full(shape=2*m_count*mul_count, fill_value=-M),
    ub=np.inf,
)


result = milp(
    c=np.zeros(3 + consts + m_count + m_count*mul_count),  # There is no optimization objective
    integrality=np.concatenate((
        np.zeros(3 + consts + m_count),  # x, w, a, consts and m continuous
        np.ones(m_count * mul_count),    # mi integral
    )),
    bounds=Bounds(
        lb=np.concatenate((
            (1_500, 1_500, 3_000),  # x, w, a
            np.full(shape=consts, fill_value=min_bw),  # consts
            np.zeros(m_count + m_count*mul_count),     # m, mi
        )),
        ub=np.concatenate((
            (2_100, 2_600, s),  # x, w, a
            np.full(shape=consts, fill_value=max_bw),   # consts
            np.full(shape=m_count, fill_value=np.inf),  # m unbounded
            np.ones(m_count*mul_count),  # mi are binary, so 0-1
        )),
    ),
    constraints=(
        constraint_m,
        constraint_selection,
        constraint_mul,
    ),
)
assert result.success, result.message
(x, w, a), distances, m, mi = np.split(result.x, (3, 3+consts, 3+consts+m_count))
mi = mi.reshape((m_count, mul_count))
multipliers = (mi @ multiples).ravel()

print(result.message)
print(f'x = {x}')
print(f'w = {w}')
print(f'a = {a}')
print(f'distance elements = {distances}')
print(f'm = {m}')
print(f'm/x quotients: {m/x}')
print(f'multipliers = {multipliers}')
print('multiply selectors =')
print(mi)
Optimization terminated successfully. (HiGHS Status 7: Optimal)
x = 1650.0
w = 2600.0
a = 6250.0
distance elements = [450.   0. 100.   0. 450.]
m = [4950. 4950. 4950. 4950. 4950. 4950. 9900.]
m/x quotients: [3. 3. 3. 3. 3. 3. 6.]
multipliers = [3. 3. 3. 3. 3. 3. 6.]
multiply selectors =
[[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

The problem is of fairly small size and high sparsity, so solves quickly. You can visualise the constraint sparsity pattern via

plt.imshow(
    np.sign(
        sp.vstack((
            constraint_m.A,
            constraint_selection.A,
            constraint_mul.A,
        )).toarray()
    ),
)
plt.axis('scaled')
plt.show()

constraint pattern

1
Nick ODell On

You're running into two problems.

  1. m % x has a derivative that confuses the optimizer.
  2. The problem you're solving has lots of local minima.

Let's start by giving the optimizer more information about how to optimize your function.

Confusing derivative

The function minimize() is a local optimizer. If there is another minimum that it would have to climb up a hill to get to, it cannot reach it.

Lets say you wanted to minimize the function f(x) = x % 100, with x bound between 50 and 150. Here is a graph of this function.

graph of f(x) = x % 100

Suppose the optimizer starts at x = 90. It looks at the derivative, and moves down the slope to the left. Then, it gets stuck at x = 50. The same thing is happening in your problem.

What you can do to solve this is to create a function which rewards the optimizer for getting closer to x = 100. I came up with this function, which is a cosine wave which gets close to zero when variable is a near-multiple of target.

def modular_difference(variable, target):
    # Move valley very slightly forward
    overshoot_distances_amount = 1e-8
    distance_to_next = variable % target / target
    distance_to_next -= overshoot_distances_amount
    ret = -np.cos(2*np.pi*distance_to_next) + 1
    return ret

(Note: The purpose of overshoot_distances_amount is explained later.)

This gives the following graph:

cosine approximation

This tells the optimizer it should move toward x=100 if it is at x=99.

I integrated this new function into your code in the following way.

First, I changed the original objective function so that it returns both m and x. Call this objective_inner().

def objective_inner(arr, n, s, spans, ejs):
    x = arr[0]
    w = arr[1]
    a =  arr[2] # (the offset is until the mid axis of the light pole)
    m = np.ones(len(spans) * 2 + 1) # We make room for the last constraint h=(s-w)
    m[0] = a - w / 2 

    for i in range(1, len(spans) * 2):
        j = (i - 1) // 2
        if i % 2 == 0:
            m[i] = (s - w) - (m[i - 1] + ejs[j] + arr[2 + i])
        else:
            m[i] = spans[j] - (m[i - 1] + n[j] * s + w + arr[2 + i])
    m[-1] = s - w  # We also need the light distance between two poles to be divisible by x
    return m, x

Then, I wrote two versions of the objective function.

def objective_differentiable(arr, n, s, spans, ejs):
    m, x = objective_inner(arr, n, s, spans, ejs)
    return modular_difference(m, x).sum()


def objective_old(arr, n, s, spans, ejs):
    m, x = objective_inner(arr, n, s, spans, ejs)
    return np.sum(m%x)

The first function is a version of your objective that has nice derivatives, using sine waves. The second function is the same as your original objective.

However, fixing the derivative is not sufficient by itself. It still gets stuck in local minima.

Global optimizers

Some functions cannot be optimized by local optimizers, because the optimizer gets trapped in a local minimum. This is one of them.

To get around this, I used the basinhopping function, which uses a local optimizer, but also takes large steps in random directions, with the hope of finding a better minimum.

I found the following worked well.

minimizer_kwargs = dict(bounds=all_bounds, method='Nelder-Mead', args=(n, s, spans, ejs))
sol = basinhopping(objective_differentiable, x0=initial, minimizer_kwargs=minimizer_kwargs, niter=100)
print("basinhopping sol", sol)
sol = minimize(fun=objective_old, x0=sol.x, bounds=all_bounds, method='Nelder-Mead', args=(n, s, spans, ejs))
print("minimize true sol", sol)

This does the following:

  • First, it uses basinhopping with Nelder-Mead as the local optimizer, optimizing the differentiable version of your objective.
  • Then, it passes the solution this found to your original objective. I did this so that it is easy to compare whether it is finding a better solution than the original. (Note: this is why overshoot_distances_amount exists in the code above. This picks a solution which is slightly after the point where the modulo wraps around, which makes it easier for the optimizer to find the exact minimum of objective_old().)

This can find solutions which get a score of 1e-4 or less on your original objective.

Here's one of the solutions it finds:

array([1501.43791458, 1989.93457535, 5499.28104559,  449.99999353,
          0.000001  ,  100.00001273,    0.00000361,  450.00000564])