I want to check the fit of my data, which I suspect is lognormally distributed using a histogram and overlaying the lognormal PDF as a line. I estimate the lognormal parameters from the data and generate n=1000 data points (same number as the data). data_list is a list containing 1000 of my datapoints which are integers.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm
...
data = np.array(data_list)
plt.hist(data, bins=32, density=True, alpha=0.6, color='g', label='Data')
sigma, _, mu = lognorm.fit(np.log(data), floc=0)
x = np.linspace(min(data), max(data), 1000)
lognormal_data = lognorm.pdf(x, sigma, scale=np.exp(mu))
plt.plot(x, lognormal_data, 'r-', lw=2, label='Lognormal Distribution')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.legend()
plt.title('Histogram Overlay with Lognormal Distribution')
plt.grid(True)
plt.show()
However, the resulting plot is this:
It seems like the initial parameters for the lognormal distribution ar off, as it does not coincide with the data. Furthermore, the curve looks more normal than lognormal. Does anybody see what i'm doing wrong here>
I'm no statistician, but if you suspect that
data
has a lognormal distribution, shouldn't you try to fitdata
instead ofnp.log(data)
?The documentation of the fit method states that it returns the following:
The same documentation states that
lognorm.pdf
has the following signature:pdf(x, s, loc=0, scale=1)
.I would therefore try the following:
Output: