When assessing covariance matrix with scipy.optimize.minimize for a simple LLS problem, I have got significant deviation when using the method L-BFGS-B. After a lot of research, I have created a detailed MCVE to highlight the issue.
MCVE
Then handy curve_fit method
Let say we want to fit the following model to some trial dataset:
import numpy as np
from scipy import optimize
def model(x, a, b, c):
return a * x[:, 0]**2 + b * x[:, 0] + c
np.random.seed(12345)
X = np.linspace(-1, 1, 30).reshape(-1, 1)
y = model(X, 3, 2, 1)
s = 0.1 * np.ones(y.size)
n = s * np.random.normal(size=y.size)
yn = y + n
popt, pcov = optimize.curve_fit(model, X, yn, sigma=s, absolute_sigma=True)
popt, pcov
# (array([3.00876769, 2.00038216, 1.03068484]),
# array([[ 3.29272571e-03, -1.11752553e-11, -1.17327007e-03],
# [-1.11752553e-11, 9.35483883e-04, 2.98424026e-12],
# [-1.17327007e-03, 2.98424026e-12, 7.51395072e-04]]))
The curve_fit method directly provides Covariance matrix which is considered as the expected result. Now we want to reproduce this covariance matrix using different optimize methods.
Check with baseline method least_squares
Using the least_squares function (which is used by curve_fit under the hood) and the recipe scipy use to robustly assess the covariance leads to compliant results:
def residuals_factory(x, y, sigma=1.):
def wrapped(beta):
return (y - model(x, *beta)) / sigma
return wrapped
residuals = residuals_factory(X, yn, sigma=s)
sol0 = optimize.least_squares(residuals, x0=[1, 1, 1])
# message: `xtol` termination condition is satisfied.
# success: True
# status: 3
# fun: [-5.954e-01 9.965e-02 ... -3.855e-01 9.455e-01]
# x: [ 3.009e+00 2.000e+00 1.031e+00]
# cost: 16.317663438370303
# jac: [[-1.000e+01 1.000e+01 -1.000e+01]
# [-8.668e+00 9.310e+00 -1.000e+01]
# ...
# [-8.668e+00 -9.310e+00 -1.000e+01]
# [-1.000e+01 -1.000e+01 -1.000e+01]]
# grad: [ 1.070e-07 -1.632e-06 1.722e-07]
# optimality: 1.632403261453419e-06
# active_mask: [ 0.000e+00 0.000e+00 0.000e+00]
# nfev: 4
# njev: 3
U, s, Vh = np.linalg.svd(sol0.jac, full_matrices=False)
tol = np.finfo(float).eps*s[0]*max(sol0.jac.shape)
w = s > tol
cov = (Vh[w].T/s[w]**2) @ Vh[w]
# array([[ 3.29272571e-03, 5.77282993e-12, -1.17327008e-03],
# [ 5.77282993e-12, 9.35483875e-04, 1.14915683e-12],
# [-1.17327008e-03, 1.14915683e-12, 7.51395089e-04]])
In this context we can reproduce the covariance.
Check with minimize method
Using minimize, we can perform the same operation by stating the Chi Squared loss function instead of residuals.
def loss_factory(x, y, sigma=1.):
def wrapped(beta):
return 0.5 * np.sum(np.power((y - model(x, *beta)) / sigma, 2))
return wrapped
loss = loss_factory(X, yn, sigma=s)
sol1 = optimize.minimize(loss, x0=[1, 1, 1])
# message: Optimization terminated successfully.
# success: True
# status: 0
# fun: 16.31766343837042
# x: [ 3.009e+00 2.000e+00 1.031e+00]
# nit: 5
# jac: [ 1.907e-06 1.192e-06 -4.768e-07]
# hess_inv: [[ 3.293e-03 1.104e-11 -1.173e-03]
# [ 1.104e-11 9.355e-04 -1.734e-12]
# [-1.173e-03 -1.734e-12 7.514e-04]]
# nfev: 32
# njev: 8
Again we are able to get the covariance matrix (see hess_inv). This also works as expected with method BFGS.
Issue with L-BFGS-B method
But when we use the L-BFGS-B method, the Inverse Hessian returned does not comply with previously computed covariance matrices:
sol2 = optimize.minimize(loss, x0=[1, 1, 1], method="L-BFGS-B")
# message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
# success: True
# status: 0
# fun: 16.317663438454282
# x: [ 3.009e+00 2.000e+00 1.031e+00]
# nit: 8
# jac: [-5.720e-05 2.917e-04 -4.182e-04]
# nfev: 36
# njev: 9
# hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
sol2.hess_inv.todense()
# array([[ 0.03719202, 0.02371079, -0.02612526],
# [ 0.02371079, 0.02215422, -0.01929008],
# [-0.02612526, -0.01929008, 0.01984616]])
I wonder why the inverse hessian is so different in this configuration and how can I adapt this code to get the covariance matrix as I did in the three previous snippets.
Related questions
Below some questions I have found that are related during research but does not answer the current question:
- How to compute standard deviation errors with scipy.optimize.least_squares
- In Scipy how and why does curve_fit calculate the covariance of the parameter estimates
- https://stats.stackexchange.com/questions/38115/how-do-i-interpret-the-covariance-matrix-from-a-curve-fit
- How to get Hessian Matrix from python minimize function?
Question
Why the method minimize with option method='L-BFGS-B' returns a so different inverse hessian and how should I fix it in order to get a proper covariance matrix.
As @NickODell pointed out, method
L-BFGS-Bcompute a limited memory approximation of the Hessian.There is a
mswitch on the underlying functionfmin_l_bfgs_b, but it's not available in theminimizesignature, therefore we cannot control how precise is this approximation.As a workaround, Hessian matrix can be computed after minimizing:
Which is compliant with other results.
Alternative: