I have a 20GB transaction data set from kaggle (http://www.kaggle.com/c/acquire-valued-shoppers-challenge/data).
row are over 300 million and variables are 11.
It is too heavy to handle with R. So I want to filter data.
id class is interger64.
Unique id has 311541 and I want sample 20000.
I'm using data.table But there is an error like the picture.
Is there a way to sample id?
If I recall correctly,
integer64
are justdouble
s masked asinteger
. Maybe the best way to obtain your subset without making any copy is to use thesetattr
function indata.table
. Try this: