I'm working on a typical classification machine learning task and face with a potential data drift problem. The phenomenon is that while my model trained some "old" data set gives fine result on an evaluation subset from "old" data, but performs very poorly (heavily misclassifies one of the classes) on some "new" data.
I suspect data drift between old and new data and tried out some methods to prove it. Apart from model based approach (training another model that tries to distinguish old and new data) I tried statistical tests as well, and each confirmed data drift. However, when I actually compare drifted features (columns) in old and new data I cannot see any significant difference.
Here is an example. Kolmogorov-Smirov test gives significant difference between the same column (which is an important feature according my model) in old and new data:
import pandas as pd
from scipy.stats import kstest
kstest(old_data[column], new_data[column])[1]
this gives P-value of 2.505780e-145
which reveals almost total difference. But when I check main descriptive stats, they doesn't seem actually different:
pd.DataFrame({"Old":old_data[column].describe(), "New":new_data[column].describe()})
gives:
Old New
mean 3.527651 3.406413
std 0.722752 0.689564
min 0.000000 0.000000
25% 3.000000 3.000000
50% 3.750000 3.083333
75% 4.000000 4.000000
max 5.000000 5.000000
Could these two columns be actually "different" despite the seemingly similarity above? I'm confused how to verify reassuringly the supposed drift.