In the post below,
aggregation using ffdfdply function in R
There is a line like this.
splitby <- as.character(data$Date, by = 250000)
Just out of curiosity, I wonder what by argument means. It seems to be related to ff dataframe but I'm not sure. Google search and R documentation of as.character and as.vector provided no useful information.
I tried some examples but the codes below give the same results.
d <- seq.Date(Sys.Date(), Sys.Date()+10000, by = "day")
as.character(d, by=1)
as.character(d, by=10)
as.character(d, by=100)
If anybody could tell me what it is, I'd appreciate it. Thank you in advance.
Since
as.character.ffworks using the defaultas.characterinternally, and in view of the fact that df vectors can be larger than RAM, the data needs to be processed in chunks. The partition into chunks is facilitated by thechunkfunction. In this case, the relevant method ischunk.ff_vector. By default, this will calculate the chunk size by dividinggetOption("ffbatchbytes")by the record size. However, this behaviour can be overridden by supplying the chunk size usingby.In the example you give, the ff vector will be converted to
character250000 members at a time.The end result will be the same for any
byor withoutbyat all. Larger values will lead to greater temporary use of RAM but potentially quicker operation.