scipy.optimize.curve_fit for a function with complex dependence on variable parameters

1.6k views Asked by At

I'm relatively new to using Python to fit my data, so please excuse my lack of programming finesse. I have, however, been unable to find a solution to the errors my current curve fitting attempts are throwing. I believe these errors are due to my model function's complex dependence on one of the two variable parameters (namely, Kd). I would appreciate insight regarding what specifically is causing this problem and how I may adjust my definition or fitting package to avoid it. Minimal working example to follow:

# Import libraries
import scipy as scipy
from scipy import stats
import numpy as np
from scipy.optimize import curve_fit

np.set_printoptions(precision=4)
ConcSyringeTotal = 9.5 ## total monomer concentration in the syringe [M]tot, in mM
Vinj = 10 ## volume injected in each injection, in uL
Vinit = 1250 ## volume of solvent initially in the sample cell, in uL
Vcell = 1000 ## cell volume, in uL (only the heat change within this volume is measured)
Injections = np.arange(2.00,26.00,1.00)
print Injections
Energy = np.array([136.953, 105.119, 84.414, 69.373, 60.898, 52.813, 46.187, 39.653, 33.894, 29.975, 27.315, 24.200, 21.643, 19.080, 16.158, 13.454, 13.218, 11.568, 10.742, 9.547, 8.693, 7.334, 6.111, 4.741])
print Energy

def DimerDissociation(injection, Kd, DHd): ## a dimer dissociation model for an ITC dilution experiment
    ## returns the heat flow (y-data, in ucal) per injection (x-data, unitless)
    ## fit for the dissociation constant (Kd, in mM = mmol/L, umol/mL, nmol/uL) 
    ## and the enthalpy of dissociation (DHd, in ucal/nmol = kcal/mol)
    ##
    ## concentration (in mM) of the free monomer in the cell after equilibration of the i-th injection
    VolumeAdded = 6+(injection-1)*Vinj ## in uL
    VolumeTotal = Vinit + VolumeAdded ## in uL
    CellTotal = ConcSyringeTotal*VolumeAdded ## Total in the cell after the i-th injection, in nmol
    ConcCellTotal = CellTotal/VolumeTotal ## Total concentration in the cell after the i-th injection, in mM
    ConcCellMonomer_roots =  np.roots([1, Kd/2, -Kd*ConcCellTotal/2]) 
    ConcCellMonomer_real = ConcCellMonomer_roots.real[abs(ConcCellMonomer_roots.imag)<1e-5]
    ConcCellMonomer_positive = ConcCellMonomer_real[ConcCellMonomer_real>0]
    ConcCellMonomer = ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
    ##
    ## concentration (in mM) of the free monomer in the syringe
    ConcSyringeMonomer_roots =  np.roots([1, Kd/2, -Kd*ConcSyringeTotal/2]) 
    ConcSyringeMonomer_real = ConcSyringeMonomer_roots.real[abs(ConcSyringeMonomer_roots.imag)<1e-5]
    ConcSyringeMonomer_positive = ConcSyringeMonomer_real[ConcSyringeMonomer_real>0]
    ConcSyringeMonomer = ConcSyringeMonomer_positive[ConcSyringeMonomer_positive<ConcSyringeTotal]
    ## nmol of the free monomer injected from the syringe
    SyringeMonomerInjected = Vinj*ConcSyringeMonomer[0]
    ##
    ## concentration (in mM) of the free monomer in the cell before the i-th injection
    VolumeAddedPre = 6+(injection-2)*Vinj
    VolumeTotalPre = Vinit + VolumeAddedPre
    CellTotalPre = ConcSyringeTotal*VolumeAddedPre
    ConcCellTotalPre = CellTotalPre/VolumeTotalPre
    ConcCellMonomerPre_roots =  np.roots([1, Kd/2, -Kd*ConcCellTotalPre/2]) 
    ConcCellMonomerPre_real = ConcCellMonomerPre_roots.real[abs(ConcCellMonomerPre_roots.imag)<1e-5]
    ConcCellMonomerPre_positive = ConcCellMonomerPre_real[ConcCellMonomerPre_real>0]
    ConcCellMonomerPre = ConcCellMonomerPre_positive[ConcCellMonomerPre_positive<ConcCellTotalPre]
    ## nmol of the free monomer in the cell before the i-th injection
    CellMonomerPre = VolumeTotalPre*ConcCellMonomerPre[0]
    ##
    ## concentration of the free monomer before equilibration of the i-th injection, in mM
    ConcCellMonomerBefore = (CellMonomerPre+SyringeMonomerInjected)/VolumeAdded
    ## concentration of the free monomer after equilibration of the i-th injection, in mM
    ConcCellMonomerAfter = ConcCellMonomer[0]
    ## change in concentration of the free monomer over the equilibration of the i-th injection, in mM
    ConcCellMonomerChange = ConcCellMonomerAfter - ConcCellMonomerBefore
    ##
    return Vcell*DHd*ConcCellMonomerChange
DimerDissociation_opt, DimerDissociation_cov = curve_fit(DimerDissociation, Injections, Energy, p0=[0.4,10])
DimerDissociation_stdev = np.sqrt(np.diag(DimerDissociation_cov))
print "optimized parameters:", DimerDissociation_opt
print "covariance matrix:", DimerDissociation_cov
print "standard deviation of fit parameters:", DimerDissociation_stdev

And the associated error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-b5ef2361feed> in <module>()
     52     ##
     53     return Vcell*DHd*ConcCellMonomerChange
---> 54 DimerDissociation_opt, DimerDissociation_cov = curve_fit(DimerDissociation, Injections, Energy, p0=[0.4,10])
     55 DimerDissociation_stdev = np.sqrt(np.diag(DimerDissociation_cov))
     56 print "optimized parameters:", DimerDissociation_opt

//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, **kw)
    553     # Remove full_output from kw, otherwise we're passing it in twice.
    554     return_full = kw.pop('full_output', False)
--> 555     res = leastsq(func, p0, args=args, full_output=1, **kw)
    556     (popt, pcov, infodict, errmsg, ier) = res
    557 

//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
    367     if not isinstance(args, tuple):
    368         args = (args,)
--> 369     shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
    370     m = shape[0]
    371     if n > m:

//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in _check_func(checker, argname, thefunc, x0, args, numinputs, output_shape)
     18 def _check_func(checker, argname, thefunc, x0, args, numinputs,
     19                 output_shape=None):
---> 20     res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
     21     if (output_shape is not None) and (shape(res) != output_shape):
     22         if (output_shape[0] != 1):

//anaconda/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in _general_function(params, xdata, ydata, function)
    443 
    444 def _general_function(params, xdata, ydata, function):
--> 445     return function(xdata, *params) - ydata
    446 
    447 

<ipython-input-38-b5ef2361feed> in DimerDissociation(injection, Kd, DHd)
     19     CellTotal = ConcSyringeTotal*VolumeAdded ## Total in the cell after the i-th injection, in nmol
     20     ConcCellTotal = CellTotal/VolumeTotal ## Total concentration in the cell after the i-th injection, in mM
---> 21     ConcCellMonomer_roots =  np.roots([1, Kd/2, -Kd*ConcCellTotal/2])
     22     ConcCellMonomer_real = ConcCellMonomer_roots.real[abs(ConcCellMonomer_roots.imag)<1e-5]
     23     ConcCellMonomer_positive = ConcCellMonomer_real[ConcCellMonomer_real>0]

//anaconda/lib/python2.7/site-packages/numpy/lib/polynomial.pyc in roots(p)
    199     """
    200     # If input is scalar, this makes it an array
--> 201     p = atleast_1d(p)
    202     if len(p.shape) != 1:
    203         raise ValueError("Input must be a rank-1 array.")

//anaconda/lib/python2.7/site-packages/numpy/core/shape_base.pyc in atleast_1d(*arys)
     47     res = []
     48     for ary in arys:
---> 49         ary = asanyarray(ary)
     50         if len(ary.shape) == 0 :
     51             result = ary.reshape(1)

//anaconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in asanyarray(a, dtype, order)
    512 
    513     """
--> 514     return array(a, dtype, copy=False, order=order, subok=True)
    515 
    516 def ascontiguousarray(a, dtype=None):

ValueError: setting an array element with a sequence.
2

There are 2 answers

0
rjonnal On

I was not able to reproduce your error. The first problem I noticed is your use of np.roots. roots(p) returns the roots of a polynomial specified by the coefficients in p, specifically p[0] + p[1] * x + p[2] * x**2 + .... Your third coefficient, -Kd*ConcCellTotal/2 is a function of injections, which was an array. There's no documented signature for np.roots that allows an array to be passed as one of the members of p.

Can you edit and clarify?

-Ravi

P.S. A toy example demonstrating how curve_fit works:

import numpy as np
from scipy.optimize import curve_fit

x_in = np.array([-3.0,-2.0,-1.0,0.0,1.0,2.0,3.0])

def f(x,a,b):
    return a*x+b

y_in = f(x_in,3,2)
parameters_fit,cov = curve_fit(f,x_in,y_in)
y_out = parameters_fit[0]*x_in+parameters_fit[1]
print parameters_fit
print y_in
print y_out

y_in = f(x_in,10,75)
parameters_fit,cov = curve_fit(f,x_in,y_in)
y_out = parameters_fit[0]*x_in+parameters_fit[1]
print parameters_fit
print y_in
print y_out

The objective function f takes as arguments an array of x-values and one or more parameters. curve_fit takes as arguments the objective function, an array of x-values x_in, and an array of y-values y_in as arguments. It then makes up some values for the parameters a and b and evaluates the objective function on x_in, which gives an array y_out. It computes the RMS error between y_in and y_out and then tweaks its values of a and b until the RMS error is minimized.

The devil is really in the details of how initial values for a and b are selected (if they're not supplied, as the OP did) and how they're tweaked. That's complicated, but not absolutely necessary for us scipy.optimize users to understand perfectly.

7
shortorian On

The problem is that numpy.curve_fit passes the xdata to your objective function as an array. This means that all the operations on injection in DimerDissociation are actually array operations. Consequently, ConcCellTotal is also an array (check this by inserting print type(ConcCellTotal) at line 27 in your code). That means your call to np.roots looks like np.roots([scalar, scalar, array]), which is the source of the error.

I always get turned around when I work with these things, but I think the idea is that the objective function for the optimizer should be completely vectorized; each time it's called, it needs to return an array with one energy value for every injection value.

I fixed the error above by explicitly making ConcCellMonomer_roots an array, and I also threw in some naive reporting on variable states:

def DimerDissociation(injection, Kd, DHd): 
    print 'Called DimerDissociation'
    VolumeAdded = 6.0+(injection-1.0)*Vinj ## in uL
    VolumeTotal = Vinit + VolumeAdded ## in uL
    CellTotal = ConcSyringeTotal*VolumeAdded ## Total in the cell after the i-th injection, in nmol
    ConcCellTotal = CellTotal/VolumeTotal ## Total concentration in the cell after the i-th injection, in mM
    print 'total\t',np.shape(ConcCellTotal)
    ConcCellMonomer_roots =  np.asarray([np.roots([1.0, Kd/2.0, -Kd*i/2.0]) for i in ConcCellTotal])
    print 'roots\t',np.shape(ConcCellMonomer_roots)
    ConcCellMonomer_real = ConcCellMonomer_roots.real[abs(ConcCellMonomer_roots.imag)<1e-5]
    print 'real\t',np.shape(ConcCellMonomer_real)
    ConcCellMonomer_positive = ConcCellMonomer_real[ConcCellMonomer_real>0]
    print 'positive\t',np.shape(ConcCellMonomer_positive)
    ConcCellMonomer = ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
    print 'monomer\t',np.shape(ConcCellMonomer)

I also made the corresponding correction to ConcCellMonomerPre_roots using np.asarray. With those edits, I get the optimizer to iterate a few times until ConcCellMonomer_roots contains some imaginary values. Once that happens, ConCellMonomer_real is no longer the same shape as ConcCellTotal, so the line ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal] throws a broadcast error. The calls to DimerDissociation give this output:

Called DimerDissociation
total   (24,)
roots   (24, 2)
real    (48,)
positive(24,)
monomer (24,)

Until the last iteration:

Called DimerDissociation
total   (24,)
roots   (24, 2)
real    (4,)
positive(4,)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
    execfile(filename, namespace)
  File "C:/Users/Devin/Documents/Python Scripts/SO.py", line 66, in <module>
    DimerDissociation_opt, DimerDissociation_cov = curve_fit(DimerDissociation, Injections, Energy, p0=[0.4,10])
  File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 533, in curve_fit
    res = leastsq(func, p0, args=args, full_output=1, **kw)
  File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 378, in leastsq
    gtol, maxfev, epsfcn, factor, diag)
  File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 444, in _general_function
    return function(xdata, *params) - ydata
  File "C:/Users/Devin/Documents/Python Scripts/SO.py", line 35, in DimerDissociation
    ConcCellMonomer = ConcCellMonomer_positive[ConcCellMonomer_positive<ConcCellTotal]
ValueError: operands could not be broadcast together with shapes (4) (24) 

Hopefully this puts you on the right track, although I'm not an expert here and someone else might have some much better ideas.