changed the code with the Gaussian args considering Sam Masons comment. The results are still wrong, since I know from QQ-plots the data is probably a decent Gaussian. I will try to post my updated code and attach the data file too. Perhaps it's obvious but I don't see how the KS-test gets it so wrong (or I). The .csv datafile can be found here: https://ln5.sync.com/dl/658503c20/5fek5x39-y8aqbkfu-tqptym98-nz75wikq
import pandas as pd
import numpy as np
alpha = 0.05
df = pd.read_csv("Z079_test_mc.csv")
columns = df.columns
with open('matrix.txt', 'a') as f:
for col in columns:
print ([col])
a, b = stats.kstest(df[[col]].dropna().values, stats.norm.cdf, args=(np.mean(df[col]),np.std(df[col])))
print('Statistics', a, 'p-value', b)
if b < alpha:
print('The null hypothesis can be rejected' + '\n')
f.write(str(col) + ',' + 'Kolmogorov Smirnov' + '\n' + \
' ' + ',' + str(a) + ',' + str(b) + 'The null hypothesis can be rejected' + '\n')
else:
print('The null hypothesis cannot be rejected')
f.write(str(col) + ',' + 'Kolmogorov Smirnov' + '\n' + \
' ' + ',' + str(a) + ',' + str(b) + 'The null hypothesis cannot be rejected' + '\n')



The parameters for a Gaussian distribution in SciPy are the location and scale. In stats speak these are mu and sigma. Hence passing the min and max as
argsis breaking things.Probably easiest is just to use
args=stats.norm.fit(values), or you could do it manually viaargs=(np.mean(values), np.std(values)). As a more complete example:or