In the post below,
aggregation using ffdfdply function in R
There is a line like this.
splitby <- as.character(data$Date, by = 250000)
Just out of curiosity, I wonder what by
argument means. It seems to be related to ff
dataframe but I'm not sure. Google search and R documentation of as.character
and as.vector
provided no useful information.
I tried some examples but the codes below give the same results.
d <- seq.Date(Sys.Date(), Sys.Date()+10000, by = "day")
as.character(d, by=1)
as.character(d, by=10)
as.character(d, by=100)
If anybody could tell me what it is, I'd appreciate it. Thank you in advance.
Since
as.character.ff
works using the defaultas.character
internally, and in view of the fact that df vectors can be larger than RAM, the data needs to be processed in chunks. The partition into chunks is facilitated by thechunk
function. In this case, the relevant method ischunk.ff_vector
. By default, this will calculate the chunk size by dividinggetOption("ffbatchbytes")
by the record size. However, this behaviour can be overridden by supplying the chunk size usingby
.In the example you give, the ff vector will be converted to
character
250000 members at a time.The end result will be the same for any
by
or withoutby
at all. Larger values will lead to greater temporary use of RAM but potentially quicker operation.