How to estimate error propagation on regressed function from covariance matrix using scipy.curve_fit?

94 views Asked by At

I need a hand with something. I'm using curve_fit from SciPy to fit a curve with three parameters. Now, when I use it, I get this thing called pcov, which I know is the covariance matrix. But here's the thing: can I use pcov to figure out how much my predictions might be off?

I know I can calculate the error between the real data and the fitted curve, but can I somehow use pcov to get that error variance directly for a new data point I'm predicting?

Thanks in advance

1

There are 1 answers

0
jlandercy On

Provided your model is continuous and smooth and the the covariance matrix is a good approximation (see @Nick ODell comments), you can use first order error propagation method to estimate first order error on your regression wrt parameters covariance.

Lets say you have the following dataset and model:

import numpy as np
import numdifftools as nd
from scipy import optimize, stats
import matplotlib.pyplot as plt

def model(x, a, b, c):
    return a * x + b / (x**2 + 1) + c

np.random.seed(123456)
x = np.linspace(-1, 1, 20)
p = (3, 2, 1)
y = model(x, *p)
s = np.full_like(y, 0.15)
n = s * np.random.normal(size=y.size)
y += n

popt, pcov = optimize.curve_fit(model, x, y, sigma=s, absolute_sigma=True)
# (array([3.01353034, 1.92219962, 1.01378869]),
#  array([[ 3.05357145e-03,  7.04390835e-11, -6.30392759e-11],
#         [ 7.04390835e-11,  3.92488970e-02, -3.02487191e-02],
#         [-6.30392759e-11, -3.02487191e-02,  2.44373750e-02]]))

We define the error function as follow:

def variance(model, x, p, Cp):
    
    def proxy(q):
        return model(x, *q)
    
    def projection(J):
        return J @ Cp @ J.T
    
    Jp = nd.Gradient(proxy)(p)
    Cy = np.apply_along_axis(projection, 1, Jp)
    
    return Cy

We can then estimate the curve:

xlin = np.linspace(x.min(), x.max(), 200)
yhat = model(xlin, *popt)

And assess the confidence interval:

alpha = 0.01
z = stats.norm.ppf(1 - alpha / 2.)
Cy = variance(model, xlin, popt, pcov)
sy = np.sqrt(Cy)
ci = z * sy

It graphically leads to:

fig, axe = plt.subplots()
axe.scatter(x, y, marker=".")
axe.plot(xlin, yhat)
axe.plot(xlin, yhat + ci, "--", color="k")
axe.plot(xlin, yhat - ci, "--", color="k")
axe.grid()

enter image description here

And gives an envelope where the curve would be found (if smooth and continuous) using parameters first order variation wrt their uncertainty.