The awesome R package recommenderlab
written by Prof. Michael Hahsler provides a recommender model based on association rules derived from his another R package arules
.
The minimum example code adapted from the documentation of recommenderlab
can be found in another post here.
The AR recommender model learned can be used to make prediction/recommendation given a userid.
pred <- predict(rec, dat[1:5,])
as(pred, "list")
[[1]]
[1] "whole milk" "rolls/buns" "tropical fruit"
[[2]]
[1] "whole milk"
[[3]]
character(0)
[[4]]
[1] "yogurt" "whole milk" "cream cheese " "soda"
[[5]]
[1] "whole milk"
I understood that the prediction is basically to first find all matching LHS from the set of rules (R) mined from the training dataset. And then recommend N unique RHS of the matching rules with the highest support/confidence/lift score.
So my question is how do you get the matching LHS rules for prediction?
From the source code we can see
m <- is.subset(lhs(model$rule_base), newdata@data)
for(i in 1:nrow(newdata)) {
recom <- head(unique(unlist(
LIST(rhs(sort(model$rule_base[m[,i]], by=sort_measure)),
decode=FALSE))), n)
reclist[[i]] <- if(!is.null(recom)) recom else integer(0)
}
I managed to access the rule_base
from the trained model via
rule_base <- getModel(rec)$rule_base
but then here comes another concern, why do head(unique(unlist(LIST(rhs(sort(model$rule_base[m[,i]], by=sort_measure)), decode=FALSE))), n)
but not first group by the rhs
and then aggregate the sort_measure
and the lhs
before sorting?
head(unique(unlist(LIST(rhs(sort(model$rule_base[m[,i]], by=sort_measure)), decode=FALSE))), n)
takes all rules with matching LHS, sorts them by the measure, and then returns then
unique RHS items with the highest measure.I guess you are thinking about aggregating the measure if there are several matching rules with the same RHS in the rule base. I thought about this as well but then decided to use the first-match strategy. The main reason was the way association rules/frequent itemsets are created. You will find for each longer rule many shorter rules with the same RHS and thus aggregating the measure by addition did not make too much sense to me.