Histogram overlay plot with lognormal distribution

172 views Asked by At

I want to check the fit of my data, which I suspect is lognormally distributed using a histogram and overlaying the lognormal PDF as a line. I estimate the lognormal parameters from the data and generate n=1000 data points (same number as the data). data_list is a list containing 1000 of my datapoints which are integers.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm

...
data = np.array(data_list)

plt.hist(data, bins=32, density=True, alpha=0.6, color='g', label='Data')

sigma, _, mu = lognorm.fit(np.log(data), floc=0)
x = np.linspace(min(data), max(data), 1000)
lognormal_data = lognorm.pdf(x, sigma, scale=np.exp(mu))


plt.plot(x, lognormal_data, 'r-', lw=2, label='Lognormal Distribution')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.legend()
plt.title('Histogram Overlay with Lognormal Distribution')
plt.grid(True)

plt.show()

However, the resulting plot is this: enter image description here

It seems like the initial parameters for the lognormal distribution ar off, as it does not coincide with the data. Furthermore, the curve looks more normal than lognormal. Does anybody see what i'm doing wrong here>

1

There are 1 answers

5
Tranbi On BEST ANSWER

I'm no statistician, but if you suspect that data has a lognormal distribution, shouldn't you try to fit data instead of np.log(data)?

The documentation of the fit method states that it returns the following:

Estimates for any shape parameters (if applicable), followed by those for location and scale.

The same documentation states that lognorm.pdf has the following signature: pdf(x, s, loc=0, scale=1).

I would therefore try the following:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm

data = np.random.lognormal(mean=1, sigma=0.2, size=1000)
plt.hist(data, bins=50, density=True, alpha=0.6, color='g', label='Data')

s, loc, scale = lognorm.fit(data)
x = np.linspace(min(data), max(data), 1000)
lognormal_data = lognorm.pdf(x, s, loc=loc, scale=scale)

plt.plot(x, lognormal_data, 'r-', lw=2, label='Lognormal Distribution')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.legend()
plt.title('Histogram Overlay with Lognormal Distribution')
plt.grid(True)

plt.show()

Output:

enter image description here