Mahout content-based similarity

972 views Asked by At

I have created a custom item similarity that simulates content-based similarity based on a product taxonomy. I have a user who likes only two items:

UserId    ItemId      Preference
7656361   1449133     1.00
7656361   18886199    8.00

My custom itemSimilarity returns values from [-1,1] where 1 should mean strong similarity, and -1 strong dissimilarity. The two items the user liked does not have any lowest common ancestors in the taxonomy tree, so they don't have value of 1. But they have values from 0, 0.20 and 0.25 with some items.

I produce recommendation in the following way:

ItemSimilarity similarity = new CustomItemSimilarity(...); 
Recommender recommender = new GenericItemBasedRecommender(model, similarity);
List<RecommendedItem> recommendations = recommender.recommend(7656361, 10);
for (RecommendedItem recommendation : recommendations) {
    System.out.println(recommendation);
}

I am getting the following result:

RecommendedItem[item:899604, value:4.5]
RecommendedItem[item:1449081, value:4.5]
RecommendedItem[item:1449274, value:4.5]
RecommendedItem[item:1449259, value:4.5]
RecommendedItem[item:715796, value:4.5]
RecommendedItem[item:3255539, value:4.5]
RecommendedItem[item:333440, value:4.5]
RecommendedItem[item:1450204, value:4.5]
RecommendedItem[item:1209464, value:4.5]
RecommendedItem[item:1448829, value:4.5]

Which at first glance someone will say, ok it produce recommendations. I tried to print the values from the itemSimilarity as it does the comparison between pairwise items, and I got this supprising result:

ItemID1  ItemID2    Similarity
899604   1449133    -1.0
899604   18886199   -1.0
1449081  1449133    -1.0
1449081  18886199   -1.0
1449274  1449133    -1.0
1449274  18886199   -1.0
1449259  1449133    -1.0
1449259  18886199   -1.0
715796   1449133    -1.0
715796   18886199   -1.0
3255539  1449133    -1.0
3255539  18886199   -1.0
333440   1449133    -1.0
333440   18886199   -1.0
1450204  1449133    -1.0
1450204  18886199   -1.0
1209464  1449133    -1.0
1209464  18886199   -1.0
1448829  1449133    -1.0
1448829  18886199   -1.0
228964   1449133    -1.0
228964   18886199    0.25
57648    1449133    -1.0
57648    18886199    0.0
899573   1449133    -1.0
899573   18886199    0.2
950062   1449133    -1.0
950062   18886199    0.25
5554642  1449133    -1.0
5554642  18886199    0.0
...

and there are few more. They are not in the produce order. I just wanted to make a point. All the items that have very strong dissimilarity of -1 are recommended, and those that have some similarity of 0.0, 0.2 and 0.25 are not recommended at all. How is this possible? The itemSimilarity method of the interface ItemSimilarity have the following explenation:

Implementations of this interface define a notion of similarity between two items. Implementations should return values in the range -1.0 to 1.0, with 1.0 representing perfect similarity.

If I use similarity between [0,1] I get the following recommendations:

RecommendedItem[item:228964, value:8.0]
RecommendedItem[item:899573, value:8.0]
RecommendedItem[item:950062, value:8.0]

And pairwise similarity is as follows (only for those tree, for the others is 0):

228964  1449133   0.0
228964  18886199  0.25
950062  1449133   0.0
950062  18886199  0.25
228964  1449133   0.0
228964  18886199  0.25

EDIT: I also printed out the most similar items to 1449133, 18886199 with: (GenericItemBasedRecommender)delegate).mostSimilarItems(new long[]{1449133, 18886199}, 10) and I got: [RecommendedItem[item:228964, value:0.125], RecommendedItem[item:950062, value:0.125], RecommendedItem[item:899573, value:0.1]]

Only for item 18886199, (GenericItemBasedRecommender)delegate).mostSimilarItems(new long[]{18886199}, 10) I got [RecommendedItem[item:228964, value:0.25]]. For 1449133 only there are no similar items.

I don't understand why it does not work with strong dissimilarity? Another question is why all the predicted preference values are 8.0 or 4.5. I can see that only the item 18886199 is similar with the the recommended items, but is there a way to multiply the value of 8.0 with the similarity in the case 0.25, and get value of 2.0 instead of 8.0. This I can't do while computing the similarity because I don't know the user yet, but I think it should be done during the recommendation phase. Isn't this how the recommender should work or maybe I should create a custom recommender and do the job in a custom way?

I would really appreciate if someone from the Mahout community can give me directions.

0

There are 0 answers