I'm trying to build a product recommender. I'm using a pyspark ml recommendation ALS matrix factorization model. I have data like the example data below, where I have customer and product id and the count of times the customer has purchased the product (prch_cnt). I'm trying train the model for implicit preferrences. What I'm wondering is do I need to normalize my prch_cnt before feeding it to the model. So for example should the prch_cnt for cutomer_id=5 and product_id=1 below be prch_cnt=3/(3+1+1) or is prch_cnt=3 just fine? My understanding is for explicit data like ratings the range of values for each product would normally be fixed (like 1 to 5 stars), otherwise you have to normalize it. I'm wondering if having a fixed range of possible values or matching scale, is a requirement for implicit as well?
data:
+------------+--------+-------------------+
|customer_id |prch_cnt|product_id |
+------------+--------+-------------------+
|5 |3.0 |1 |
|5 |1.0 |2 |
|5 |1.0 |2 |
|7 |10.0 |1 |
|7 |1.0 |2 |
|9 |150.0 |2 |
+------------+--------+-------------------+
code:
from pyspark.ml.recommendation import ALS
als = ALS(implicitPrefs=True,
nonnegative = True,
userCol="customer_id",
itemCol="product_id",
ratingCol="prch_cnt",
coldStartStrategy="drop")
model = als.fit(training)
# top 5 customer recs
userRecs = model.recommendForAllUsers(5)
See if you don't have fixed value in rating column ( explicit or implicit) you will get the prediction value in same way.
Example :
initially i have rating 0,1,2,3 so i was getting prediction value ( -1.6686,2., 3.) max upto 3
but then i changed my rating value and included rating 5, and 10 also so now i am getting prediction value upto 6
I hope it cleared you doubt.