How to fit a logistic distribution use a fixed location parameter?

101 views Asked by At

I use scipy.stats.logistic.fit() with fixed location value, to get an estimated scale parameter, while the residual does not hold normality from qqplot.

How should I improve it and how to visualize the errors? Is there other way to estimated the parameters for a logistic distribution?

For the data I used: shape (1000,150), the histogram shows a slightly right-tailed. use the maximum value in the dataset as location, and to estimate the parameter of scale.


Here is the code I used:

datalogitic= np.loadtxt(f"datalogiticfit.txt")
data=datalogitic[:,:51].flatten()
loc=data.max()
loc_lower=loc
loc_upper=loc
scale_lower=0
scale_upper=1000

dist = stats.logistic
bounds = [(loc_lower, loc_upper), (scale_lower,scale_upper)]
result = stats.fit(dist, data,bounds,method='mle')
print("Parameters of the Logistic Distribution are : ", result.params)
result.plot()
plt.show()

x = np.linspace(data.min(), data.max(), len(data))
cdf = stats.logistic.cdf(x, loc=result.params[0], scale=result.params[1])
plt.plot(x, cdf, label='Logistic CDF (Estimated Parameters)')
ecdf = ECDF(data)
plt.plot(ecdf.x, ecdf.y, label='Empirical CDF', linestyle="dashdot")
plt.legend()
plt.show()

When I looked at PDF of logistic provided in the page of scipy.stats.logistic:

f(x) = \frac{\exp(-x)}
       {(1+\exp(-x))^2}

The denominator part is not the same as found in the book Handbook of the Logistic Distribution (Statistics a Series of Textbooks and Monographs) (N. Balakrishnan).

f(x,u,s) = \frac{\exp(-(x-u)/s)}
           {s*(1+\exp(-(x-u)/s)))^2}

Or they are equivalent?

2

There are 2 answers

1
jlandercy On

Lets create a synthetic dataset:

import numpy as np
from scipy import stats, optimize

np.random.seed(123456)

law = stats.logistic(loc=1.2, scale=2.3)
data = law.rvs(30000)

You can fit with fixed parameters:

stats.logistic.fit(data, floc=1.2)
# (1.2, 2.30433453917283)

If you want to perform MLE by yourself, just state the likelihood:

def factory(x, loc=0.):
    def wrapped(p):
        return - stats.logistic(loc=loc, scale=p[0]).logpdf(x).sum()
    return wrapped

likelihood = factory(data, loc=1.2)

sol = optimize.minimize(likelihood, x0=[1])
#   message: Optimization terminated successfully.
#   success: True
#    status: 0
#       fun: 85067.14024628093
#         x: [ 2.304e+00]
#       nit: 7
#       jac: [ 0.000e+00]
#  hess_inv: [[ 1.238e-04]]
#      nfev: 16
#      njev: 8

Which also provides you uncertainty on parameter estimation (hess_inv).

0
Matt Haberland On

Or they are equivalent?

Yes. One is specific to location and variance of zero and one, respectively; the other includes them as parameters. Read the note in the SciPy documentation that begins "The probability density above is defined in the “standardized” form."

How should I improve it

Don't fix the location to the maximum of the data. If the location is truly known, fix it to the known value; if not, include it as a free parameter when fitting. The location of the distribution is its median, so 50% of the observations you'd draw would be less than the location. Therefore, the probability of all N observations being less than or equal to the location parameter is 1/2**N. Also, ignoring rounding due to double precision, the probability of an observation exactly matching the location parameters is zero. So if you think you know the location parameter, and if it happens to equal the maximum value of the data, I'd suggest checking your assumptions or your data collection; something is wrong.

and how to visualize the errors?

To visualize the errors, you can create several different plots after fitting with scipy.stats.goodness_of_fit. See the plot method of FitResult.

Is there other way to estimated the parameters for a logistic distribution?

Yes, there are three ways that are easily done in SciPy:

Method of L-moments is also possible, but it requires custom code.

Here is some code to fit the distribution, produce a Q-Q plot, and check goodness of fit using scipy.stats.goodness_of_fit.

import numpy as np
from scipy import stats
rng = np.random.default_rng(238923498234)

dist_family = stats.logistic

# simulate real data, which is not necessarily distributed
# according to logistic distribution.
loc, scale = 1.25, 3.25
data = rng.normal(loc=loc, scale=scale, size=1000)

# For guessing the parameters, use method of moments, which
# sets the parameters such that the distribution moments match
# the sample moments.

# Location was fixed in OP's original problem, so I'm fixing
# it here. However, in case a guess is needed, note that the
# distribution moment is equal to the location parameter.
sample_mean = np.mean(data)
l = sample_mean
# verify that they match
np.testing.assert_allclose(dist_family(loc=l).mean(), sample_mean)

# To get a guess of scale parameter, set the expression
# for the distribution variance equal to the sample variance
# and solve for the scale parameter.
sample_var = np.var(data, ddof=1)
s = np.sqrt(3*sample_var/np.pi**2)
# verify that they match
np.testing.assert_allclose(dist_family(loc=loc, scale=s).var(), sample_var)

# Fit using `logistic.fit`, fixing location at `loc` and
# providing `s` as a guess of the scale
params0 = dist_family.fit(data, floc=loc, scale=s)

# fit using `stats.fit`
bounds = {'loc': (loc, loc),
          'scale': (s/10, s*10)}  # usually sufficiently loose
dist = stats.fit(dist_family, data, bounds=bounds)
# consider trying `method=mse` for maximum product spacing estimate

# Check for agreement between `logistic.fit` and `stats.fit`
np.testing.assert_allclose(dist.params, params0)

# Produce Q-Q plot
dist.plot(plot_type='qq')

# Test goodness of fit using Anderson-Darling (AD) test
# Other methods are available; see documentation. 
# Use `known_params` to fix parameters are truly known;
# use `fit_parameters` for those that are fitted to data
res = stats.goodness_of_fit(dist_family, data,
                            known_params={'loc': loc},
                            fit_params={'scale': dist.params[1]},
                            statistic='ad')

res
# GoodnessOfFitResult(fit_result=params: FitParams(loc=1.25, scale=1.8552584333141091)
#                     success: True
#                     message: 'The fit was performed successfully.',
#                     statistic=1.5959097382901746, pvalue=0.1199, 
#                     null_distribution=array([0.28497412, 0.59373208, 0.86174867, ..., 0.69325478, 0.97925162, 1.70976138]))

enter image description here