I have a large dataset in CSV:
- There are 50,000 rows, each row is one transaction.
- There are a maximum of 5 items and a minimum of 1 item in each transaction.
- There are 5000 different possible item values.
- There are no duplicate items in a transaction.
After loading the CSV into RStudio and applying unclass()
, I apply as(...,"transactions")
.
The result is something like this:
# transactions in sparse format with
# 5 transactions (rows) and
# 1455 items (columns)
Instead of 50,000 transactions, there are only 5 now.
Where have all the transactions gone? Was the matrix somehow transposed (as the row count in the result equals the column count of my CSV)?
This may be a data pre-processing problem, but according to this post my input data should have the right format.
[I'm posting for the first time here and am fairly new to R/RStudio.]
Have a look at the
coercion
methods in the man page? transactions
. You will see that you either need a binary incidence matrix, a list of transactions, or a data.frame containing only categorical variables. Your data is not one of these toas(..., "transactions")
will fail.I think
read.transactions
can read you data.