Python KNeighborsClassifier

1.1k views Asked by At

I'm having a little problem with KNeighborsClassifierfrom sklearn.neighbors

I have a huge file of ratings for movies, where each line represents a user and each column a movie.

I want to suggest a movie(he hasn't watched yet) to a user based on the movies he has rated and rating of other users.

I tried that with:

    model = KNeighborsClassifier(n_neighbors=3)
    model.fit(user_rated, others_rated)
    suggestList = model.predict_proba(others_unrated)

user_rated is list of (float) ratings others_rated is 2d list with the same movie rating user has rated, but by different users others_unrated is 2d list with movie ratings by other users that current user hasn't watched yet

I think the problem is, because others_rated is 2D list, but if i compare it to only one other user(use others_rated[user_num]) I'll accomplish nothing. With model.predict_proba(others_unrated) I get the same error if insert for just for one or many users, Incompatible dimension for X and Y matrices.

Any suggestions?

1

There are 1 answers

0
Andreus On BEST ANSWER

I am unsure of what you hope to accomplish, but let me infer a few things.

  • First, others_rated is an N_users length list of N_movies_rated_by_this_user length lists of floats.
  • Second, others_unrated is an N_users length list of N_movies_not_rated_by_this_user length list of floats.

From these statements, and without access to your data files/arrays, I would guess this is the correct thing for what you are trying to do:

import numpy as np
model = KNeighborsClassifier(n_neighbors=3)
model.fit( np.transpose(others_rated),user_rated)
suggestList = model.predict_proba(np.transpose(others_unrated))

The two changes I have made are as follows: First, I am nearly certain you must have X and y swapped around in your call to .fit(). If you don't, your problem is so badly posed (mathematically) it is almost certain to fail: you are trying to train a model to predict a matrix from a vector (predict lots of information from not very much information).

Second, the way you have posed the problem, n_users should be the column dimension. This is the only thing that makes sense mathematically. The number of columns X when calling KNeighborsClassifier.predict_proba(X) must be the same as the number of columns in X in the previous call to KNeighborsClassifier.fit(X,y).