Kolmogorov-Smirnov Test in Python for Goodness of fit

3.1k views Asked by At

I am trying to get the best distributions for my data. The fitting is finished as shown in below figure, but i need a measurement, to choose the best model. I compared goodness of fit with a chi-squared value, and test for significant difference between observed and fitted distribution with a Kolmogorov-Smirnov (KS) test. I searched for some of potential solutions 1,2,3 but I didn't get my answer.From the results in below figure:

  1. If the p-value is higher than k-statistic, does it means we can accept the hypothesis or data fits the distribution well?

  2. Alternatively, is it ok to compare level of significance(a=0.005) with p-value and decide the acceptance or rejection of hypothsis ? If p-value is lower than a, then it is very probable that the two distributions are different.

  3. For Kolmogorov-Smirnov test, is it essential to standardised the data (-1,1) ?

  4. Judging from the KS statistic and P-values, the exponnorm fits best in the data. Is that correct?

enter image description here

I calculated the P-value in following way:

for distribution in dist_names:
    # Set up distribution and get fitted distribution parameters
    dist = getattr(scipy.stats, distribution)
    param = dist.fit(y_std)   
    p = scipy.stats.kstest(y_std, distribution, args=param)[1]
    p = np.around(p, 5)
    p_values.append(p) 
1

There are 1 answers

4
Newcomer On
  1. No, you can either compare K-statistic to critical value in K-test critical value table or compare p-value to the level of significance, which is 0.005 in your case.
  2. Right, in statistics, if p-value is small, we reject the null and accept the alternative one.
  3. No, if we standardize the data before applying KS-test, we lose information about the distribution of raw data. For example, if data comes from a geometric distribution, after normalization, it is going to converge in distribution as normal (0,1) as the number of samples goes to infinity.
  4. Yes, because p-value> a in this case, we fail to reject our null and accept this the input data has the same distribution as exponnorm.
    By the way, this question should belong to Cross Validated since it is more or less related to statistical knowledge. Hope this answer helps you.