preparing product purchase data for pyspark ALS implicit recommendations

532 views Asked by At

I'm trying to build a product recommender. I'm using a pyspark ml recommendation ALS matrix factorization model. I have data like the example data below, where I have customer and product id and the count of times the customer has purchased the product (prch_cnt). I'm trying train the model for implicit preferrences. What I'm wondering is do I need to normalize my prch_cnt before feeding it to the model. So for example should the prch_cnt for cutomer_id=5 and product_id=1 below be prch_cnt=3/(3+1+1) or is prch_cnt=3 just fine? My understanding is for explicit data like ratings the range of values for each product would normally be fixed (like 1 to 5 stars), otherwise you have to normalize it. I'm wondering if having a fixed range of possible values or matching scale, is a requirement for implicit as well?

data:

+------------+--------+-------------------+
|customer_id |prch_cnt|product_id         |
+------------+--------+-------------------+
|5           |3.0     |1                  |
|5           |1.0     |2                  |
|5           |1.0     |2                  |
|7           |10.0    |1                  |
|7           |1.0     |2                  |
|9           |150.0   |2                  |
+------------+--------+-------------------+

code:

from pyspark.ml.recommendation import ALS


als = ALS(implicitPrefs=True,
nonnegative = True,
          userCol="customer_id",
          itemCol="product_id",
          ratingCol="prch_cnt",
         coldStartStrategy="drop")
model = als.fit(training)


# top 5 customer recs

userRecs = model.recommendForAllUsers(5)
1

There are 1 answers

0
Sachin Tiwari On

See if you don't have fixed value in rating column ( explicit or implicit) you will get the prediction value in same way.

Example :

initially i have rating 0,1,2,3 so i was getting prediction value ( -1.6686,2., 3.) max upto 3

but then i changed my rating value and included rating 5, and 10 also so now i am getting prediction value upto 6

-------+------+-----------+
|movieId|userId| prediction|
+-------+------+-----------+
|     29|     3|    6.34046|
|     94|     3|  4.3311176|
|     26|     3|  3.6043417|
|      2|     3|  3.0270371|
|     46|     3|  2.3173037|
|      0|     3|  2.3090997|
|     86|     3|  1.1750394|
|     56|     3|  1.1681526|
|     76|     3|  0.6635845|
|     79|     3| 0.17606063|
|     14|     3| -0.2127747|
|     91|     3|  -0.587868|
|     66|     3|-0.72813153|
|     37|     3| -1.1676543|
|     70|     3|   -1.21106|
|     52|     3| -1.3105489|
|      8|     3| -1.6253037|
|      7|     3| -1.7214308|
+-------+------+-----------+

I hope it cleared you doubt.