Similarity between each users always 0 while using the KNNBasic of the python Surprise package based on user

Question

Similarity between each users always 0 while using the KNNBasic of the python Surprise package based on user

304 views Asked by maston At 23 November 2022 at 10:58

The actual situation is that I need to find users with similar interests according to the url favorites of a large number of users. So my data only have "like" without "dislike" and "ignore". And for the number of urls is almost unlimited, it is also impossible to assume that all urls without "like" are "dislike" or "ignore". So, in this case, how should I convert the raw data to a Surprise Dataset? Or, these data is impossible to used by algorithms such as KNN and so on for relative recommendation of collaborative filtering?

source data of favorite items per User:

  s_data = [
    [
        "user1",
        [
            "item1",
            "item2",
            "item3",
            "item4",
            "item5",
            "item6"
        ]
    ],
    [
        "user2",
        [
            "item3",
            "item4",
            "item5",
            "item6"
        ]
    ],
    [
        "user3",
        [
            "item1",
            "item2",
            "item3",
            "item6"
        ]
    ],
    [
        "user4",
        [
            "item4",
            "item5",
            "item6",
            "item7",
            "item8",
            "item9"
        ]
    ]   
]

Because there is only one case in the original data that the user "likes" the item, I will assume that the user scored '1' for the item they liked. Python Code:

import pandas as pd
from surprise import Dataset, KNNBasic, Reader

# prepare for data
df_pre = [[z[0], zz, 1] for z in s_data if z[1] is not None for zz in z[1]]
df = pd.DataFrame(df_pre)
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(df, reader)
trainset = data.build_full_trainset()


# trainning
sim_options = {'name': 'pearson', 'user_based': True}
algo = KNNBasic(sim_options=sim_options)
algo.fit(trainset)


# calc similarity
inner_id = algo.trainset.to_inner_uid(ruid='user1') 
all_instances = algo.trainset.all_users
rs = [(x, algo.sim[inner_id][x]) for x in all_instances() if x != inner_id]
sorted_rs = sorted(rs, key=lambda x: x[1], reverse=True)
print(sorted_others)

result: [(1, 0.0), (2, 0.0), (3, 0.0)]

the similarity between each users:

raw data in tabular form：

As shown above, the result obtained by the program is that the correlation between all people is 0. If I change to cosine, msd, the result is the same. If it is replaced by pearson_baseline, it will prompt "ZeroDivisionError: float division".

I want to know how to use KNN to find similar behavior users of a certain user with data as shown above. Thanks a lot.

Original Q&A

There are 1 answers

**ljdyer** · Answer 1 · 2022-11-23T11:43:06+00:00

You need to include information about items that users do not like so that you have both 0s and 1s in your dataset. The data should look like this (just screenshotting the top part here):

I got this dataframe with this code:

users_and_items = {e[0]:e[1] for e in s_data}
users = sorted(list(users_and_items.keys()))
items = sorted(list(set([item for item_list in users_and_items.values() for item in item_list])))
df_pre = [(user, item, 1 if item in users_and_items[user] else 0) for user in users for item in items]
df = pd.DataFrame(df_pre)

Now running your code with the new df:

import pandas as pd
from surprise import Dataset, KNNBasic, Reader

# prepare for data
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(df, reader)
trainset = data.build_full_trainset()


# trainning
sim_options = {'name': 'pearson', 'user_based': True}
algo = KNNBasic(sim_options=sim_options)
algo.fit(trainset)


# calc similarity
inner_id = algo.trainset.to_inner_uid(ruid='user1') 
all_instances = algo.trainset.all_users
rs = [(x, algo.sim[inner_id][x]) for x in all_instances() if x != inner_id]
print(rs)

Gives:

Computing the pearson similarity matrix...
Done computing similarity matrix.
[(1, 0.6324555320336759), (2, 0.6324555320336759), (3, -0.5)]

Which I believe is more like what you expected to see.

TechQA.

Similarity between each users always 0 while using the KNNBasic of the python Surprise package based on user

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in KNN

Related Questions in COSINE-SIMILARITY

Related Questions in COLLABORATIVE-FILTERING

Popular Questions

Trending Questions