How to turn a pygam model into a SciPy BSpline

730 views Asked by At

I would like to get the best of two worlds: Scipy and pygam. Specifically, I want to penalise the third derivative of my data using a custom pygam penalty function, and then I would like to use the result as a Scipy BSpline. The latter allows for easy calculation of the n-th derivative of the data. Of course, gam.coef_ gives me the coefficients of my model, but what are the knots? If get close by using gam.terms.get_coef_indices()-1, but the result should really be identical to gam.predict(x), i.e.

After a helpful comment by ev-br, we know now that we can get the knot edges by pygam.utils.get_knot_edges. Therefore, a promising piece of code would be

plt.figure()
res0=pygam.LinearGAM(terms=pygam.terms.s(0),n_splines=len(x),penalties=MyPenalty,lam=0.3,fit_intercept=False, basis='ps').fit(x,y,1/np.sqrt(w))
plt.plot(x,y,'x',zorder=10, label='data')
knot_edges=utils.gen_edge_knots(x,dtype='numerical')
knots=np.linspace(knot_edges[0],knot_edges[-1],len(res0.coef_))
plt.plot(x,res0.predict(x), label='PyGAM fit')
plt.plot(x,scipy.interpolate.BSpline(knots,res0.coef_,3)(x), label='BSpline from PyGAM parameters')
   
    
plt.legend()

Alas, the output is not very satisfying: enter image description here

2

There are 2 answers

1
ev-br On

According to https://github.com/dswah/pyGAM/issues/265, pyfam fits uniform knots. The location of the first and last knots can be extracted via pygam.utils.gen_edge_knots.

3
itpme On

Unfortunately B-spline methods are not standardized (I have developed five different B-spline fitting algorithms over forty years, and none are compatible with any others!), and close examination of the pygam and scipy results show that unfortunate incompatibility. First, the knots are either shifted and scaled (scipy splprep) or not (pygam), and second, the B-spline coordinates and coefficients are incompatible due to differing methods of computation. Although it may be possible to convert between them, it may prove to be almost intractable.

Another point is that if you want to use smoothing, splprep provides this (agreed, it may not be as easy and elegant to use as the pygam method).

Here is a method (using numerical differentiation) that may be suitable for pygam, compared with the spldev derivative for a set of tree growth data.

import numpy as np
import scipy
from scipy.interpolate import interp1d, splprep, splev
from scipy.sparse import csr_matrix
import matplotlib.pyplot as plt
import pandas as pd
import math

from pygam import LinearGAM, s, f, utils
import pygam as pygam

X = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]

Y = [5,5,5.045,5.234,5.369,5.558,5.693,
     5.819,5.864,5.882,5.882,5.882,5.882,
     5.882,5.906,6.074,6.398,6.674,6.842,
     6.998,7.07,7.082,7.082,7.082
    ]


plt.figure()
# Plot data
plt.plot(X,Y,'x',zorder=10, label='data')

# First, pygam
GAM=pygam.LinearGAM(terms=pygam.terms.s(0),n_splines=len(X),
                     lam=0.3,fit_intercept=False,
                     basis='ps').fit(X,Y)

knot_edges=utils.gen_edge_knots(X,dtype='numerical')
knots=np.linspace(knot_edges[0],knot_edges[-1],len(GAM.coef_))
print('GAM Knots')
print(knots)
coeffs = GAM.coef_
print('GAM Coefficients')
print(coeffs)
GAMY = GAM.predict(X)
plt.plot(X,GAMY, label='PyGAM fit')
print('GAMY fit')
print(GAMY)

# Second, splprep
tck, u = splprep([X,Y],s=0) # force interpolation by s=0
# where t = knots (scaled 0 to 1)
#       c = coefficients
#       k = degree of spline (default = 3)
#       u = An array of the values of the parameter.
new_points = splev(u,tck, der=0)
plt.plot(new_points[0], new_points[1], '.', color='k', label='splprep fit')
print('tck')
print(tck)
print('u')
print(u)

#Ynew = scipy.interpolate.BSpline(knots,res0.coef_,3)
#plt.plot(X,Ynew)
         #(X,label='BSpline from PyGAM parameters'))
   
plt.legend()
plt.show()

# Derivatives, splev
dev_points_x, dev_points_y = splev(u, tck, der=1) # first derivative
plt.plot(u*23+1, dev_points_y/23, '--ro', label='splev derivative')
#u and y must be scaled and shifted

# Deriv, GAM (by numerical difference method)
XX = GAM.generate_X_grid(term=0, n=200)
#now do derivative(s)
Yprime = np.zeros(200) ## Y prime
GAMY = GAM.predict(XX) # fitted Y on XX
for i in range(1,len(Yprime)-1):
    Yprime[i] = (GAMY[i+1] - GAMY[i-1])/(XX[i+1] - XX[i-1])
# compute central difference on XX[i]
# now fix up Yp1[0] and Yp1[end]
#Yp1[0] = Yp1[1] # just substitution
Yprime[0] =  (GAMY[1] - GAMY[0])/(XX[1] - XX[0])  # forward difference
#Yp1[len(Yp1)-1] = Yp1[len(Yp1) - 2] # just substitution
last = len(Yprime)-1
Yprime[last] = (GAMY[last] - GAMY[last-1])/(XX[last] - XX[last-1]) # backward diff


plt.plot(XX,Yprime,'-b',label='GAM numerical deriv' )

plt.legend()
plt.show()

Data and fitted curves

Derivatives via pygam and splprep