Let's say I have a dataframe containing 3 columns as follows:
| Score | A | B |
|---|---|---|
| 0.9953 | 1 | 0 |
| 0.5436 | 0 | 1 |
I've calculated the similarity score of a string column and added it to the dataframe. Now for columns A and B, which are also string data, I've encoded them and passed them to knn. This knn model produces the nearest neighbors based on the similairty score. Now, I get nearest neighbors using only A and B. What should I do to incorporate the Score column with this model? I started exploring machine learning very recently so I would very much appreciate your help in this.
What I though of doing is either taking a threshold value and filtering the df based on the score column and then passing this new df to the knn model. Or getting the nearest neighbors from the knn model first and then, I though of adding the knn score of these neighbor data with the Score column and then sorting the dataframe to get the most similar data. But I don't think this is the right approach.