The meaning of 'c(min =0, max =0)' in vapply()

41 views Asked by At
salaries <- list(leaders = c(250, 200), assistant = 100, members = c(300, 200, 180, 120, 100))


> vapply(salaries, range, c(min=0, max=0))
leaders assistant members
min     200       100     100
max     250       100     300

In this script, the results are always the same regardless of the min and max values, so I wonder what '=0' means here.

What I've tried.

> vapply(salaries, range, c(min=0.1, max=1))
leaders assistant members
min     200       100     100
max     250       100     300


> vapply(salaries, range, c(min=2, max=1))
leaders assistant members
min     200       100     100
max     250       100     300


> vapply(salaries, range, c(min=1000, max=1000))
leaders assistant members
min     200       100     100
max     250       100     300
1

There are 1 answers

0
jay.sf On

Realize what third argument of vapply actually means.

Consider this for loop. To code it efficiently, we pre-allocate memory, i.e. we create an empty numeric array m (aka matrix) that we fill up later. m will have number of rows according to the output of the range function which is of length 2, and number of columns according to the length of the object we will loop over, length(salaries).

> m <- array(0, dim=c(2, length(salaries)), dimnames=list(NULL, names(salaries)))
> for (i in seq_along(salaries)) {
+   m[, i] <- range(salaries[i])
+ }
> m
     leaders assistant members
[1,]     200       100     100
[2,]     250       100     300

Since vapply automatically detects length(salaries), we only need to specify type and length of the output of the range function, which is exactly what we're doing in the third argument. How exactly you do that is up to you; you could do c(0, 0), c(min=2, max=1), rep.int(0, 2)—I personally use numeric(length=2L) (aka double(.)) which clarifies best, that a numeric vector of length 2 is wanted.

> vapply(X=salaries, FUN=range, FUN.VALUE=numeric(2L))
     leaders assistant members
[1,]     200       100     100
[2,]     250       100     300

Note, that, since your results will be of type double, doing FUN.VALUE=integer(2L) would fail.

Pre-allocation with vapply results in a much faster calculation than using sapply, which gives the same result but is slower without allocation.

> sapply(X=salaries, FUN=range)
     leaders assistant members
[1,]     200       100     100
[2,]     250       100     300

Benchmark

To show that this actually makes a difference, here a benchmark of the examples.

> salaries_l <- salaries[sample.int(length(salaries), 5e5, replace=TRUE)]
> microbenchmark::microbenchmark(
+   vapply=vapply(X=salaries_l, FUN=range, FUN.VALUE=numeric(2L)),
+   sapply=sapply(X=salaries_l, FUN=range),
+   `for`={
+     m <- array(0, dim=c(2, length(salaries_l)), dimnames=list(NULL, names(salaries_l)))
+     for (i in seq_along(salaries_l)) {
+       m[, i] <- range(salaries_l[i])
+     }
+     m
+   },
+   check='identical',
+   times=10L
+ )

$ Rscript --vanilla foo.R
Unit: seconds
   expr      min       lq     mean   median       uq      max neval cld
 vapply 1.538160 1.547140 1.687497 1.564839 1.880192 1.933732    10  a 
 sapply 1.746593 1.771601 1.851726 1.818901 1.944878 1.980924    10  a 
    for 2.669507 2.689559 2.860819 2.744123 3.142934 3.150242    10   b