I have a dataset where I have 360 samples for class 0 and only 44 samples for class 1. When I fit a KNN model to the data using k=3 the model misclassifies lots of samples as class 0. What is the best way to deal with such unevenly sampled data? I could set k=1 but from what I have read leads to a noise having a strong effect.

1 Answers

Community On Best Solutions

Check out this discussion on CrossValidated, especially the third answer. One approach mentioned, for example, is to weigh neighbors "by the inverse of their class size". In your example with k=3, this would mean that in a situation where two nearest neighbors are class 0, and one nearest neighbor is class 1, the label would be class 1 since 1/44 > 2/360. This is only one approach and you can check out more approaches in the discussion linked above. I hope this helps!