I am trying to do some market basket analysis using the arules package, but when I use the summary() function on an itemMatrix object to check which are the most frequent items, the numbers do not add up.
If I do:
library(arules)
x <- read.transactions("Supermarket2014-15.csv")
summary(x)
I get:
transactions as itemMatrix in sparse format with
5001 rows (elements/itemsets/transactions) and
997 columns (items) and a density of 0.003557162
most frequent items:
45 28 42 35 22 (Other)
503 462 444 440 413 15474
But if I check with a for loop, or even in Excel, the count for the product 45 is 513 and not 503. The same for 28, which should be 499, and so on.
The odd thing is if I sum up all the totals (15474+413+440+444+462+503) I get the correct number for the total of transacted products.
The data has several NA values and products are factors.

And here is the raw data (Day ranges from 1 to 28, Product ranges from 1 to 50):

If you look at the result of your
str(x)call then you see under@iteminfoand$labelsthat some items have labels like"1;1", etc. This means that the items are not correctly separated after reading the file in. The default separator inread.transactions()is a white space, but you seem to have (some) semicolons there. Trysep=";"inread.transactions().