Fitting a lognormal function to CDF in Python

709 views Asked by At

I have a very large dataset which I need to do some statistical analysis on. The data is too large to read in all at once, so I only have the binned histogram to work off. In particular, I would love to fit the cumulatives (i.e, number of counts to the right of each point x in the histogram).

Here's a script I have which makes some mock data:

mu, sigma = 0.3, 1.3
x1 = np.random.lognormal(mu, sigma, size = 100000) # random dist
bins = 10**arange(0, 4, 0.01) # actual bins my real data uses

a, b = np.histogram(x1, bins = bins)

# calculating the cumulatives
cum = []
for i, v in enumerate(a):
    cum.append(sum(a[i:]))

So the cumulative I want to fit looks like the following:

clf()
loglog(b[:-1], cum)
xlabel("Amps")
ylabel("# Occurences/Year")
show()

Plot of Cumulative which I need to fit

My questions are as follows:

1) How do I fit a lognormal to the cumulative? I see scipy.stats.lognorm.fit takes in the original dataset as an argument.

2) I see from this stack overflow question that you can 'restore' the data from the histogram. I'd like to work off the cumulative though. Is this the right approach?

As you can probably guess, I'm not used to working with these distributions.

Thanks!

0

There are 0 answers