Using kNN with weighted dataset

66 views Asked by At

I have a dataset df:

category var 1 ... var 32 weighting country
1 blue 1.0 54.2 3.0 US
2 pink 0.0 101.0 1.0 other
3 blue 1.0 49.9 3.0 US
4 green 1.0 72.2 9.0 US

I'm using the kNN classifier (on the country variable) but need it to take into account the current dataset weights I have included. After looking at the sklearn pack I can see the KNeighborsClassifier() does have a weight argument, can I set this argument 'weight = df.weighting'? or do I have to go about this another way?

1

There are 1 answers

2
Anna Andreeva Rogotulka On

you can explode samples by weight, for example, or you can think about creating custom weighted distance function

for weight, x_sample, y_sample in zip(sample_weights, X, y):
    weighted_X.extend([x_sample] * int(weight))
    weighted_y.extend([y_sample] * int(weight))