Kolmogorov test for python

588 views Asked by At

Im trying to test if data follows a "normal" distribution, but kstest is not working as I expect. Vy using normal from numpy it "Draw random samples from a normal (Gaussian) distribution".

from scipy.stats import kstest, norm
from numpy.random import seed, normal

seed(42)
data = normal(80, 6, 1000)
# data = norm.rvs(loc=80, scale=6, size=1000)

ksstat, p_value = kstest(data, "norm")

if p_value > 0.05:
    print('it looks like Gaussian (fail to reject H0)')
else:
    print('it doesnt looks like Gaussian (reject H0)')

I already checked two ways of generating normal distributions with numpy and scipy but this is not giving as a result that this is a normal distribution.

However, by transforming (data - np.mean(data))/np.std(data) i get as it is normal distribution.

What I am missing here? why it is not direct with this test the result of normality?

1

There are 1 answers

0
Warren Weckesser On BEST ANSWER

scipy.stats.kstest tests the data against the given distribution--with the given distribution parameters (if any). When you use kstest(data, "norm"), the distribution is the standard normal distribution, with mean 0 and standard deviation 1. You generated the data with mean 80 and standard deviation 6, so naturally it does not match.

You can normalize the data as you show in the question, or, if you happen to know the parameters, you can pass them to kstest using the args parameter:

ksstat, p_value = kstest(data, "norm", args=(80, 6))

Or, you could estimate the parameters from the data:

ksstat, p_value = kstest(data, "norm", args=(data.mean(), data.std()))