Python, Lmfit, globalfit

511 views Asked by At

I am trying to do a global fit using 1 shared parameter and 6 other parameters. dataComplete is my data parsed from a text file and dose = x values. Here, the bValue changes per the concentration and others stay the same. There are some calculations using the parameters before they are subjected to the global fit. I get two errors

1: nan values and 2: TypeError: Improper input: N=35 must not exceed M=22

I checked individual elements and they seem to work, but not the global fit. Any help will be much appreciated. I tried to use code from a previously solved example here, but obviously something is wrong.

##file parser
## coded elsewhere

dataComplete = [[90.47, 91.6, 89.83, 68.58, 56.68, 41.93, 1.09, 17.0, 9.19, 1.84, 0.1, 0.38, -0.37, 0.47, 0.65, -0.01, 0.36, 0.18, 0.29, 0.12, 0.87, -1.26], [108.07, 115.49, 102.48, 104.09, 110.15, 72.64, 49.11, 37.35, 24.94, 14.33, 6.31, 3.07, 9.81, 0.2, 0.74, 0.59, 1.11, 0.44, 0.34, 1.03, 0.06, 1.08], [100.93, 118.82, 89.76, 116.32, 127.06, 97.5, 69.71, 47.17, 44.16, 22.15, 18.6, 9.64, 2.56, 2.02, 0.93, 0.29, 1.11, -0.04, 0.27, -0.23, 1.0, 0.18], [99.87, 110.15, 92.21, 92.59, 96.8, 92.69, 81.39, 70.09, 44.08, 39.48, 7.66, 21.49, 2.96, 5.04, 1.87, 2.49, 1.26, 0.53, 0.65, 0.97, 1.15, 0.52], [102.14, 97.94, 94.85, 94.48, 101.22, 94.81, 111.42, 94.96, 61.2, 33.17, 29.37, 11.33, 23.67, 15.97, 6.52, 2.4, 2.29, 0.27, 0.08, 0.31, 0.09, -0.09]]

dose = [-5.0, -5.30103, -5.60206, -5.90309, -6.20412, -6.50515, -6.80618, -7.10721, -7.40824, -7.70927, -8.0103, -8.31133, -8.61236, -8.91339, -9.21442, -9.51545, -9.81648, -10.11751, -10.41854, -10.71957, -11.0206, -11.32163]



def objective(params,xdata,ydata):
    resid = 0.0*ydata[:]
    for i in range(5):
        logEc50 = 10**(params['logEc50_%i' %(i+1)].value)
        bValue = params['bValue_%i' %(i+1)].value
        pA2 = params['pA2_%i' %(i+1)].value
        schildSlope = params['schildSlope_%i' %(i+1)].value
        top = params['top_%i' %(i+1)].value
        bottom = params['bottom_%i' %(i+1)].value
        hillSlope = params['hillSlope_%i' %(i+1)].value

        ec50 = 10**(logEc50)
        Antag = 1+(bValue/(10**(-1*pA2)))**schildSlope
        LogEC=np.log((ec50*Antag))

        resid[i, :] =bottom + (top-bottom)/(1+10**((LogEC-dose)*hillSlope))
    return resid[i, :] - ydata[i, :]


params = Parameters()
molar = [1.25e-7,2.5e-8,1e-8,5e-9,0]

for i in range(5):
    params.add('logEc50_%i' % (i+1),value=-6.96)
    params.add('bValue_%i' % (i+1),value=molar[i])
    params.add('pA2_%i' % (i+1),value=7.48)
    params.add('schildSlope_%i' % (i+1),value=1)
    params.add('bottom_%i' % (i+1),value=-0.2)
    params.add('top_%i' % (i+1),value=109.496)
    params.add('hillSlope_%i' % (i+1),value=1)


#data
dataComplete = [[]]*5
dataComplete[0]=conc1
dataComplete[1]=conc2
dataComplete[2]=conc3
dataComplete[3]=conc4
dataComplete[4]=conc5


print dataComplete
dataComplete = np.array(dataComplete)
assert(dataComplete.shape) ==(5,22)

#x data
dose = np.array(dose)

#fitting
result = minimize(objective,params,args=(dose,dataComplete))

lmfit.printfuncs.report_fit(result.params)
1

There are 1 answers

2
M Newville On

I would guess that you're hitting numerical issues with lots of powers of 10s and logs. For example, with

logEc50_i ~= 7

you then raise that to the power 10 with

logEc50 = 10**(params['logEc50_%i' %(i+1)].value)

then raise that to the power 10:

ec50 = 10**(logEc50)

(is doing that twice just a mistake?)

Then you multiply this by Antag that is the ratio of two numbers ~1e-7 that should result in something close to unity, and then take the log of that:

LogEC=np.log((ec50*Antag))

I think you really want to use properties of logarithms like

log(a*b) = log(a) + log(b)
log(x**y) = y*log(x)

here to prevent numerical instabilities.

I didn't see which parameter was meant to be shared between datasets, but it does look like you intend to fit 5 separate datasets with 7 variables for each -- is one of these meant to be constrained to have the same value for all datasets? If you only have 22 observations total it doesn't much matter whether that is 35 variables or 31 variables, there still are not enough total observation to fit that many variables.

But it looks like you really have 22 observations for each of the five datasets. I think what you might be running into is that you need to evaluate the residual for each dataset and then concatenate the result, so that the resdual array for all five datasets has 110 elements.