Why does as(..., "transactions") for arules in R seem to lose transactions?

698 views Asked by At

I have a large dataset in CSV:

see attached image

  • There are 50,000 rows, each row is one transaction.
  • There are a maximum of 5 items and a minimum of 1 item in each transaction.
  • There are 5000 different possible item values.
  • There are no duplicate items in a transaction.

After loading the CSV into RStudio and applying unclass(), I apply as(...,"transactions").

The result is something like this:

# transactions in sparse format with
#  5 transactions (rows) and
#  1455 items (columns)

Instead of 50,000 transactions, there are only 5 now.

Where have all the transactions gone? Was the matrix somehow transposed (as the row count in the result equals the column count of my CSV)?

This may be a data pre-processing problem, but according to this post my input data should have the right format.

[I'm posting for the first time here and am fairly new to R/RStudio.]

1

There are 1 answers

0
Michael Hahsler On BEST ANSWER

Have a look at the coercion methods in the man page ? transactions. You will see that you either need a binary incidence matrix, a list of transactions, or a data.frame containing only categorical variables. Your data is not one of these to as(..., "transactions") will fail.

I think read.transactions can read you data.

library(arules)

# create and write some data
data <- paste(
   "item1,item2,,,", 
   "item1,,,,", 
   "item2,item3,,,", 
   sep="\n")
write(data, file = "demo_basket")

# read the data
tr <- read.transactions("demo_basket", format = "basket", sep=",")
inspect(tr)

    items        
[1] {item1,item2}
[2] {item1}      
[3] {item2,item3}