When assessing covariance matrix with scipy.optimize.minimize
for a simple LLS problem, I have got significant deviation when using the method L-BFGS-B
. After a lot of research, I have created a detailed MCVE to highlight the issue.
MCVE
Then handy curve_fit
method
Let say we want to fit the following model to some trial dataset:
import numpy as np
from scipy import optimize
def model(x, a, b, c):
return a * x[:, 0]**2 + b * x[:, 0] + c
np.random.seed(12345)
X = np.linspace(-1, 1, 30).reshape(-1, 1)
y = model(X, 3, 2, 1)
s = 0.1 * np.ones(y.size)
n = s * np.random.normal(size=y.size)
yn = y + n
popt, pcov = optimize.curve_fit(model, X, yn, sigma=s, absolute_sigma=True)
popt, pcov
# (array([3.00876769, 2.00038216, 1.03068484]),
# array([[ 3.29272571e-03, -1.11752553e-11, -1.17327007e-03],
# [-1.11752553e-11, 9.35483883e-04, 2.98424026e-12],
# [-1.17327007e-03, 2.98424026e-12, 7.51395072e-04]]))
The curve_fit
method directly provides Covariance matrix which is considered as the expected result. Now we want to reproduce this covariance matrix using different optimize
methods.
Check with baseline method least_squares
Using the least_squares
function (which is used by curve_fit
under the hood) and the recipe scipy
use to robustly assess the covariance leads to compliant results:
def residuals_factory(x, y, sigma=1.):
def wrapped(beta):
return (y - model(x, *beta)) / sigma
return wrapped
residuals = residuals_factory(X, yn, sigma=s)
sol0 = optimize.least_squares(residuals, x0=[1, 1, 1])
# message: `xtol` termination condition is satisfied.
# success: True
# status: 3
# fun: [-5.954e-01 9.965e-02 ... -3.855e-01 9.455e-01]
# x: [ 3.009e+00 2.000e+00 1.031e+00]
# cost: 16.317663438370303
# jac: [[-1.000e+01 1.000e+01 -1.000e+01]
# [-8.668e+00 9.310e+00 -1.000e+01]
# ...
# [-8.668e+00 -9.310e+00 -1.000e+01]
# [-1.000e+01 -1.000e+01 -1.000e+01]]
# grad: [ 1.070e-07 -1.632e-06 1.722e-07]
# optimality: 1.632403261453419e-06
# active_mask: [ 0.000e+00 0.000e+00 0.000e+00]
# nfev: 4
# njev: 3
U, s, Vh = np.linalg.svd(sol0.jac, full_matrices=False)
tol = np.finfo(float).eps*s[0]*max(sol0.jac.shape)
w = s > tol
cov = (Vh[w].T/s[w]**2) @ Vh[w]
# array([[ 3.29272571e-03, 5.77282993e-12, -1.17327008e-03],
# [ 5.77282993e-12, 9.35483875e-04, 1.14915683e-12],
# [-1.17327008e-03, 1.14915683e-12, 7.51395089e-04]])
In this context we can reproduce the covariance.
Check with minimize
method
Using minimize
, we can perform the same operation by stating the Chi Squared loss function instead of residuals.
def loss_factory(x, y, sigma=1.):
def wrapped(beta):
return 0.5 * np.sum(np.power((y - model(x, *beta)) / sigma, 2))
return wrapped
loss = loss_factory(X, yn, sigma=s)
sol1 = optimize.minimize(loss, x0=[1, 1, 1])
# message: Optimization terminated successfully.
# success: True
# status: 0
# fun: 16.31766343837042
# x: [ 3.009e+00 2.000e+00 1.031e+00]
# nit: 5
# jac: [ 1.907e-06 1.192e-06 -4.768e-07]
# hess_inv: [[ 3.293e-03 1.104e-11 -1.173e-03]
# [ 1.104e-11 9.355e-04 -1.734e-12]
# [-1.173e-03 -1.734e-12 7.514e-04]]
# nfev: 32
# njev: 8
Again we are able to get the covariance matrix (see hess_inv
). This also works as expected with method BFGS
.
Issue with L-BFGS-B
method
But when we use the L-BFGS-B
method, the Inverse Hessian returned does not comply with previously computed covariance matrices:
sol2 = optimize.minimize(loss, x0=[1, 1, 1], method="L-BFGS-B")
# message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
# success: True
# status: 0
# fun: 16.317663438454282
# x: [ 3.009e+00 2.000e+00 1.031e+00]
# nit: 8
# jac: [-5.720e-05 2.917e-04 -4.182e-04]
# nfev: 36
# njev: 9
# hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
sol2.hess_inv.todense()
# array([[ 0.03719202, 0.02371079, -0.02612526],
# [ 0.02371079, 0.02215422, -0.01929008],
# [-0.02612526, -0.01929008, 0.01984616]])
I wonder why the inverse hessian is so different in this configuration and how can I adapt this code to get the covariance matrix as I did in the three previous snippets.
Related questions
Below some questions I have found that are related during research but does not answer the current question:
- How to compute standard deviation errors with scipy.optimize.least_squares
- In Scipy how and why does curve_fit calculate the covariance of the parameter estimates
- https://stats.stackexchange.com/questions/38115/how-do-i-interpret-the-covariance-matrix-from-a-curve-fit
- How to get Hessian Matrix from python minimize function?
Question
Why the method minimize
with option method='L-BFGS-B'
returns a so different inverse hessian and how should I fix it in order to get a proper covariance matrix.
As @NickODell pointed out, method
L-BFGS-B
compute a limited memory approximation of the Hessian.There is a
m
switch on the underlying functionfmin_l_bfgs_b
, but it's not available in theminimize
signature, therefore we cannot control how precise is this approximation.As a workaround, Hessian matrix can be computed after minimizing:
Which is compliant with other results.
Alternative: