R Arules: how to remove certain itemsets from lhs/rhs

2.5k views Asked by At

I have loaded a file as transactions in R:

path = "my_file.csv"
t = read.transactions(path,format="single", sep=';',cols=c("ID","Products"))

#get the rules:
rules = apriori(t,parameter = list(supp=0.01, conf=0.33, minlen=2, maxlen=4))
#sort by confidence:
rules = sort(rules, by="confidence", decreasing=TRUE)
#inspect the first 10 rules:
inspect(rules[1:10])

Which output is:

     lhs      rhs  support  confidence        lift
[1]  {e,b} => {a}     0.01        0.97  some_value      
[2]  {a}   => {f}     0.04        0.92  some_value 
[3]  {t,f} => {a}     0.12        0.91  some_value 
[4]  {b,j} => {a}     0.09        0.82  some_value 
[5]  {e}   => {a}     0.25        0.77  some_value 
[6]  {g,h} => {a}     0.05        0.56  some_value 
[7]  {p}   => {a}     0.31        0.54  some_value 
[8]  {q,n} => {h}     0.18        0.49  some_value 
[9]  {s}   => {a}     0.07        0.46  some_value 
[10] {s,d} => {a}     0.20        0.42  some_value 

Now my problem is that the itemset {a} is too much frequent, and I would like to set apriori rules generator in such way that item {a}, or any other item I don't want to consider, does not appear in generated rules. I know a easy way would be to remove item {a} from the transaction file uploaded; anyway even if easy it is not smart and elegant, and also very long because I am working with hundreds of different transaction files.

Searching the web I found this setting mode for specifying lhs and rhs:

rules = apriori(t,parameter = list(supp=0.01, conf=0.33, minlen=2, maxlen=4), appearance=list(default="lhs", rhs="b"))

The output of the inspect is now:

      lhs     rhs         support    confidence          lift
[1]  {a,b} => {b}     other_value   other_value   other_value      
[2]  {a}   => {b}     other_value   other_value   other_value       
[3]  {a,f} => {b}     other_value   other_value   other_value       
[4]  {b,j} => {b}     other_value   other_value   other_value      
[5]  {a}   => {b}     other_value   other_value   other_value       
[6]  {a,h} => {b}     other_value   other_value   other_value       
[7]  {a}   => {b}     other_value   other_value   other_value       
[8]  {q,a} => {b}     other_value   other_value   other_value      
[9]  {a}   => {b}     other_value   other_value   other_value       
[10] {a,d} => {b}     other_value   other_value   other_value       

So it is possible to tell Apriori which item we want in rhs (or lhs); But it is not possible to tell the Apriori which item we DON'T want. Or it is not possible to tell in this way I have tried (I don't want {a}):

    rules = apriori(t,parameter = list(supp=0.01, conf=0.33, minlen=2, maxlen=4), appearance=list(default="lhs", rhs!="a"))

This give an Error.

Any suggests? Thanks

1

There are 1 answers

1
Michael Hahsler On BEST ANSWER

Have a look at ? APappearance

The first example shows how to exclude individual items from itemsets. You can also do this for mining rules:

 data("Adult")

 ## find only frequent itemsets which do not contain small or large income
 is <- apriori(Adult, parameter = list(support= 0.1, target="frequent"), 
   appearance = list(none = c("income=small", "income=large"),
   default="both"))