summary() of transactions is wrong for itemMatrix object

316 views Asked by At

I am trying to do some market basket analysis using the arules package, but when I use the summary() function on an itemMatrix object to check which are the most frequent items, the numbers do not add up. If I do:

library(arules)
x <- read.transactions("Supermarket2014-15.csv")
summary(x)

I get:

transactions as itemMatrix in sparse format with
 5001 rows (elements/itemsets/transactions) and
 997 columns (items) and a density of 0.003557162 

most frequent items:    
45      28      42      35      22 (Other) 
503     462     444     440     413   15474 

But if I check with a for loop, or even in Excel, the count for the product 45 is 513 and not 503. The same for 28, which should be 499, and so on. The odd thing is if I sum up all the totals (15474+413+440+444+462+503) I get the correct number for the total of transacted products.

The data has several NA values and products are factors.

str(x)

And here is the raw data (Day ranges from 1 to 28, Product ranges from 1 to 50):

Raw Data

1

There are 1 answers

0
Michael Hahsler On

If you look at the result of your str(x) call then you see under @iteminfo and $labels that some items have labels like "1;1", etc. This means that the items are not correctly separated after reading the file in. The default separator in read.transactions() is a white space, but you seem to have (some) semicolons there. Try sep=";" in read.transactions().