how to update a data.table by its subset?

97 views Asked by At

Suppose we have a huge data.frame data (60000x8000) which has been converted as data.table by setDT(data). This will create a reference of data rather than make a copy of it, which is great. Then I want to obtain a subset, for example, the first 40000 rows, of data.

id <- rep(FALSE, nrow(data))
id[1:40000] <- TRUE
data <- subset(data, id)

However, the code above doesn't perform well, since it will make a copy of data, and I need to manually call gc() to release the memory. In my example, additional 1.6 GB was consumed by subset, which can be completely released by gc(). I have read some documents about the use of data.table and maybe I have missed something important. The examples I have found focus on extracting the subset of a data.table and then assign it to a new variable, instead of to update the original one. For example:

new.data <- subset(data, id)

Thanks a lot.

0

There are 0 answers