Covariance deviation when using method L-BFGS-B of scipy.optimize.minimize

135 views Asked by At

When assessing covariance matrix with scipy.optimize.minimize for a simple LLS problem, I have got significant deviation when using the method L-BFGS-B. After a lot of research, I have created a detailed MCVE to highlight the issue.

MCVE

Then handy curve_fit method

Let say we want to fit the following model to some trial dataset:

import numpy as np
from scipy import optimize

def model(x, a, b, c):
    return a * x[:, 0]**2 + b * x[:, 0] + c

np.random.seed(12345)
X = np.linspace(-1, 1, 30).reshape(-1, 1)
y = model(X, 3, 2, 1)
s = 0.1 * np.ones(y.size)
n = s * np.random.normal(size=y.size)
yn = y + n

popt, pcov = optimize.curve_fit(model, X, yn, sigma=s, absolute_sigma=True)
popt, pcov

# (array([3.00876769, 2.00038216, 1.03068484]),
#  array([[ 3.29272571e-03, -1.11752553e-11, -1.17327007e-03],
#         [-1.11752553e-11,  9.35483883e-04,  2.98424026e-12],
#         [-1.17327007e-03,  2.98424026e-12,  7.51395072e-04]]))

The curve_fit method directly provides Covariance matrix which is considered as the expected result. Now we want to reproduce this covariance matrix using different optimize methods.

Check with baseline method least_squares

Using the least_squares function (which is used by curve_fit under the hood) and the recipe scipy use to robustly assess the covariance leads to compliant results:

def residuals_factory(x, y, sigma=1.):
    def wrapped(beta):
        return (y - model(x, *beta)) / sigma
    return wrapped

residuals = residuals_factory(X, yn, sigma=s)

sol0 = optimize.least_squares(residuals, x0=[1, 1, 1])
#     message: `xtol` termination condition is satisfied.
#     success: True
#      status: 3
#         fun: [-5.954e-01  9.965e-02 ... -3.855e-01  9.455e-01]
#           x: [ 3.009e+00  2.000e+00  1.031e+00]
#        cost: 16.317663438370303
#         jac: [[-1.000e+01  1.000e+01 -1.000e+01]
#               [-8.668e+00  9.310e+00 -1.000e+01]
#               ...
#               [-8.668e+00 -9.310e+00 -1.000e+01]
#               [-1.000e+01 -1.000e+01 -1.000e+01]]
#        grad: [ 1.070e-07 -1.632e-06  1.722e-07]
#  optimality: 1.632403261453419e-06
# active_mask: [ 0.000e+00  0.000e+00  0.000e+00]
#        nfev: 4
#        njev: 3

U, s, Vh = np.linalg.svd(sol0.jac, full_matrices=False)
tol = np.finfo(float).eps*s[0]*max(sol0.jac.shape)
w = s > tol
cov = (Vh[w].T/s[w]**2) @ Vh[w]

# array([[ 3.29272571e-03,  5.77282993e-12, -1.17327008e-03],
#        [ 5.77282993e-12,  9.35483875e-04,  1.14915683e-12],
#        [-1.17327008e-03,  1.14915683e-12,  7.51395089e-04]])

In this context we can reproduce the covariance.

Check with minimize method

Using minimize, we can perform the same operation by stating the Chi Squared loss function instead of residuals.

def loss_factory(x, y, sigma=1.):
    def wrapped(beta):
        return 0.5 * np.sum(np.power((y - model(x, *beta)) / sigma, 2))
    return wrapped

loss = loss_factory(X, yn, sigma=s)

sol1 = optimize.minimize(loss, x0=[1, 1, 1])
#  message: Optimization terminated successfully.
#  success: True
#   status: 0
#      fun: 16.31766343837042
#        x: [ 3.009e+00  2.000e+00  1.031e+00]
#      nit: 5
#      jac: [ 1.907e-06  1.192e-06 -4.768e-07]
# hess_inv: [[ 3.293e-03  1.104e-11 -1.173e-03]
#            [ 1.104e-11  9.355e-04 -1.734e-12]
#            [-1.173e-03 -1.734e-12  7.514e-04]]
#     nfev: 32
#     njev: 8

Again we are able to get the covariance matrix (see hess_inv). This also works as expected with method BFGS.

Issue with L-BFGS-B method

But when we use the L-BFGS-B method, the Inverse Hessian returned does not comply with previously computed covariance matrices:

sol2 = optimize.minimize(loss, x0=[1, 1, 1], method="L-BFGS-B")
#  message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
#  success: True
#   status: 0
#      fun: 16.317663438454282
#        x: [ 3.009e+00  2.000e+00  1.031e+00]
#      nit: 8
#      jac: [-5.720e-05  2.917e-04 -4.182e-04]
#     nfev: 36
#     njev: 9
# hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>

sol2.hess_inv.todense()
# array([[ 0.03719202,  0.02371079, -0.02612526],
#        [ 0.02371079,  0.02215422, -0.01929008],
#        [-0.02612526, -0.01929008,  0.01984616]])

I wonder why the inverse hessian is so different in this configuration and how can I adapt this code to get the covariance matrix as I did in the three previous snippets.

Related questions

Below some questions I have found that are related during research but does not answer the current question:

Question

Why the method minimize with option method='L-BFGS-B' returns a so different inverse hessian and how should I fix it in order to get a proper covariance matrix.

1

There are 1 answers

0
jlandercy On BEST ANSWER

As @NickODell pointed out, method L-BFGS-B compute a limited memory approximation of the Hessian.

There is a m switch on the underlying function fmin_l_bfgs_b, but it's not available in the minimize signature, therefore we cannot control how precise is this approximation.

As a workaround, Hessian matrix can be computed after minimizing:

from autograd import hessian

H = hessian(loss)(solution.x)
C = np.linalg.inv(H)

# array([[ 3.29272573e-03,  1.28301871e-19, -1.17327009e-03],
#        [ 2.44025127e-19,  9.35483871e-04, -6.92261149e-20],
#        [-1.17327009e-03, -3.24227332e-20,  7.51395089e-04]])

Which is compliant with other results.

Alternative:

import numdifftools

H = numdifftools.Hessian(loss)(solution.x)
C = np.linalg.inv(H)