how to filter data which integer64 class in data.table in r

450 views Asked by At

I have a 20GB transaction data set from kaggle (http://www.kaggle.com/c/acquire-valued-shoppers-challenge/data).

row are over 300 million and variables are 11.

It is too heavy to handle with R. So I want to filter data.

enter image description here

id class is interger64.

Unique id has 311541 and I want sample 20000.

I'm using data.table But there is an error like the picture.

Is there a way to sample id?

1

There are 1 answers

1
nicola On

If I recall correctly, integer64 are just doubles masked as integer. Maybe the best way to obtain your subset without making any copy is to use the setattr function in data.table. Try this:

#remove the integer64 class
setattr(transaction$id,"class",NULL)
custom_sample<-sample(unique(transaction$id),20000)
sample_transac<-transaction[id %in% custom_sample,]
#give the integer64 class back
setattr(sample_transac$id,"class","integer64")