Association Rule Mining (Confidence and Lift)

1.6k views Asked by At

I am currently operating a specific project for my university. What I will be doing in the project is building a cross-selling model with association rule mining.

In the result, I have tons of rules but I am not sure how to rank them which would be the best.

Which option would be better if

Option 1: Confidence=20% Lift= 5

Option 2: Confidence = 50% Lift = 2

I know confidence is important, but I have heard Lift is very important as well. Should I be sacrificing some confidence for more lift or keep it balance?

1

There are 1 answers

0
n01dea On

it depends on what the aim is of the association rule mining is:

e.g.:

 - 100.000 transactions' database

 - 2.000 tranasctions contain {(a, b)}

 - 800 transactions contain {(a, b, c)}

support of itemset {(a, b, c)}: (800 / 100.000) * 100 = 0,8%.

the support of an itemset indicates how often a random transaction of the database contains the items of the itemset.


confidence of association rule {(a, b)} -> {(c)}: (800 / 2000) * 100 = 40%.

the confidence of an association rule indicates how often a random transaction of the dabase that contains the consequent of an association rule also contains the ancedent of the association rules.


lift of association rule {(a, b)} -> {(c)}: 40 / ((5.000 / 100.000) * 100) = 8.

the lift is the ratio of the confidence to the expected confidence of an association rule. the confidence of the association rule is 40%. expected confidence in this context means that if {(a, b)} occurs in a transaction that this does not increases the pobability of that {(c)} occurs in this transaction as well.

e.g. if {(c)} occurs in 5.000 transactions of the database then the expected confidence is (100.000 / 5.000) * 100 = 5%.

a lift value of an asscoiation rule which is higher then 1 indicates that the association rule is useful. a lift value less or equal 1 indicates that the association rule is not useful. in this case it is like the antecedent and the consequent of the association rule are independent of each other. the usefulness of the indication of the association rule that if a transaction holds ({a, b}) that it then associates ({c}) is not more useful then that ({a, b}) accociates ({c}) by chance.

e.g. if all 100.000 transactions of the database contain {(c)} the expected value of {(c)} is (100.000 / 100.000) * 100 = 100%. the lift is 40 / 100 = 0,4. this is less then 1. therefore the association rule {(a, b)} -> {(c)} is not useful. {(c)} is in every transaction. if there is {(a, b)} in a transaction there is {(c)} in it either way. no use of an association.


here the circle closes: it depends on the aim of the association rule mining. if the aim is to create extra strong association rules the confidence needs to be extra high. if the purpose is to create extra useful asscociation rules the lift needs to be extra high.