Im trying to test if data follows a "normal" distribution, but kstest is not working as I expect. Vy using normal
from numpy it "Draw random samples from a normal (Gaussian) distribution".
from scipy.stats import kstest, norm
from numpy.random import seed, normal
seed(42)
data = normal(80, 6, 1000)
# data = norm.rvs(loc=80, scale=6, size=1000)
ksstat, p_value = kstest(data, "norm")
if p_value > 0.05:
print('it looks like Gaussian (fail to reject H0)')
else:
print('it doesnt looks like Gaussian (reject H0)')
I already checked two ways of generating normal distributions with numpy
and scipy
but this is not giving as a result that this is a normal distribution.
However, by transforming (data - np.mean(data))/np.std(data)
i get as it is normal distribution.
What I am missing here? why it is not direct with this test the result of normality?
scipy.stats.kstest
tests the data against the given distribution--with the given distribution parameters (if any). When you usekstest(data, "norm")
, the distribution is the standard normal distribution, with mean 0 and standard deviation 1. You generated the data with mean 80 and standard deviation 6, so naturally it does not match.You can normalize the data as you show in the question, or, if you happen to know the parameters, you can pass them to
kstest
using theargs
parameter:Or, you could estimate the parameters from the data: