How to load CSV file instead of built in dataset in "Surprise" Python recommender system?

Question

How to load CSV file instead of built in dataset in "Surprise" Python recommender system?

1.7k views Asked by Alan At 07 April 2021 at 00:01

I don't know how to write a code to load a CSV file or .inter file instead of the built in dataset in this example of evaluating a dataset as a recommender system:

from surprise import SVD
from surprise import KNNBasic
from surprise import Dataset
from surprise.model_selection import cross_validate

# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')

# Use the famous SVD algorithm.
algo = KNNBasic()

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

How would the full line of code be where I only need to input datapath and filename? I have tried the website for Surprise, but I didn't find anything. So I don't want the movielens code in the example, but instead a line that loads a datapath and file.

Original Q&A

There are 1 answers

**patrpok** · Answer 1 · 2021-05-18T11:53:17+00:00

At first you need to create instance of Reader():

reader = Reader(line_format=u'rating user item', sep=',', rating_scale=(1, 6), skip_lines=1)

Note that line_format parameter can be only 'rating user item' (optionally 'timestamp' may be added) and these parameters has nothing to do with names of columns in your custom_rating.csv. Thats why skip_lines=1 prameter is defined (it skips first line in your csv file where usually column names are defined). On the other hand line_format parameter determines the order of columns. So just to be clear my custom_ratings.csv looks like this:

rating,userId,movieId
4,1,1
6,1,2
1,1,3
. . .
. . .
. . .

Now you can create your data instance:

data = Dataset.load_from_file("custom_rating.csv", reader=reader)

Finally you can proceed with creating SVD model as shown in examples:

# sample random trainset and testset
# test set is made of 20% of the ratings.
trainset, testset = train_test_split(data, test_size=.2)

# We'll use the famous SVD algorithm.
algo = SVD()

# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)

PS: And also don't forget to import libraries at the beginning of your code :)

from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
from surprise.model_selection import train_test_split

TechQA.

How to load CSV file instead of built in dataset in "Surprise" Python recommender system?

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in RECOMMENDATION-ENGINE

Related Questions in RECOMMENDERLAB

Popular Questions

Popular Tags

Trending Questions