What does the "by" argument in ffbase::as.character do?

184 views Asked by At

In the post below,

aggregation using ffdfdply function in R

There is a line like this.

splitby <- as.character(data$Date, by = 250000)

Just out of curiosity, I wonder what by argument means. It seems to be related to ff dataframe but I'm not sure. Google search and R documentation of as.character and as.vector provided no useful information.

I tried some examples but the codes below give the same results.

d <- seq.Date(Sys.Date(), Sys.Date()+10000, by = "day")
as.character(d, by=1)
as.character(d, by=10)
as.character(d, by=100)

If anybody could tell me what it is, I'd appreciate it. Thank you in advance.

2

There are 2 answers

0
Nick Kennedy On BEST ANSWER

Since as.character.ff works using the default as.character internally, and in view of the fact that df vectors can be larger than RAM, the data needs to be processed in chunks. The partition into chunks is facilitated by the chunk function. In this case, the relevant method is chunk.ff_vector. By default, this will calculate the chunk size by dividing getOption("ffbatchbytes") by the record size. However, this behaviour can be overridden by supplying the chunk size using by.

In the example you give, the ff vector will be converted to character 250000 members at a time.

The end result will be the same for any by or without by at all. Larger values will lead to greater temporary use of RAM but potentially quicker operation.

9
smci On

First, that function is ffbase::as.character, not plain old base::as.character

See http://www.inside-r.org/packages/cran/ffbase/docs/as.character.ff which says

as.character((x, ...))

Arguments:
x: a ff vector
...: other parameters passed on to chunk

So the by argument is being passed through to some chunk function. Then you need to figure out which package's chunk function is being used. Type ?chunk, tell us which one, then go read its doc to see what its by argument does.