What is the `vapply` equivalent for `mapply`?

88 views Asked by At

In base R, sapply has a safer (and sometimes faster) variant called vapply. mapply is a multivariate version of sapply.

I am running into an edge case issue when using mapply (length-0 input to mapply (not to FUN) yields a list() instead of integer(0) ).

Is there an vapply equivalent of mapply that allows to specify FUN.VALUE (the expected return value type/dimension)?

If not, what is the the recommended pattern in those situations?

A toy example:

size_of_union <- function(A, B) length(union(A, B))
# normal case:
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
mapply(size_of_union, x, y)
#> [1] 3 1 1

# edge-case:
x <- integer(0)
y <- integer(0)
mapply(size_of_union, x, y)
#> list()  # integer(0) would be desired here

A more contrived toy example:

range_of_intersect <- function(A, B) range(intersect(A, B))

x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
mapply(range_of_intersect, x, y)
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#>      [,1] [,2] [,3]
#> [1,]    3    2  Inf
#> [2,]    3    2 -Inf


x <- numeric(0)
y <- numeric(0)
mapply(range_of_intersect, x, y)
#> list() # structure(numeric(0), .Dim = c(2L, 0L)) would be desired
3

There are 3 answers

2
moodymudskipper On BEST ANSWER

For your first case you might use as.integer(Map(size_of_union, x, y))

More generally you can still use vapply() but you'll need to loop on the index rather than on parallel vectors :

size_of_union <- function(A, B) length(union(A, B))
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
vapply(seq_along(x), function(i) size_of_union(x[[i]], y[[i]]), integer(1))
#> [1] 3 1 1

x <- integer(0)
y <- integer(0)
vapply(seq_along(x), function(i) size_of_union(x[[i]], y[[i]]), integer(1))
#> integer(0)

range_of_intersect <- function(A, B) range(intersect(A, B))
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
res <- vapply(seq_along(x), function(i) range_of_intersect(x[[i]], y[[i]]), numeric(2))
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
res
#>      [,1] [,2] [,3]
#> [1,]    3    2  Inf
#> [2,]    3    2 -Inf
dput(res)
#> structure(c(3, 3, 2, 2, Inf, -Inf), dim = 2:3)

x <- numeric(0)
y <- numeric(0)
res <- vapply(seq_along(x), function(i) range_of_intersect(x[[i]], y[[i]]), numeric(2))
res
#>     
#> [1,]
#> [2,]
dput(res)
#> structure(numeric(0), dim = c(2L, 0L))

Created on 2023-07-04 with reprex v2.0.2

7
jakub On

In base R, there is no version of mapply() I know of where you could enforce the output value type. You can look into using the pmap_*() functions from the package purrr.

E.g., for your example:

size_of_union <- function(A, B) length(union(A, B))

x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
purrr::pmap_int(list(x, y), size_of_union)
#> [1] 3 1 1

x <- integer(0)
y <- integer(0)
purrr::pmap_int(list(x, y), size_of_union)
#> integer(0) # purrr::pmap_int handles edge-case correctly

A different way of looking at it

Your edge-case is not really about data types, it is about zero-length inputs. vapply() probably just creates an empty vector on the basis of FUN.VALUE in order to put the results inside it, but then it does not iterate at all (the input is of length zero) and so it remains empty.

mapply() works differently, creating a list first and then coercing into an atomic vector/matrix if SIMPLIFY = TRUE (the default). So the placeholder is an empty list(), rather than an empty atomic vector.

This is also why the stopifnot() does not throw an error on zero-length input - it is never called in the first place because no iterations happened.

I would just do as.integer(mapply(..., SIMPLIFY = TRUE)) which converts list() if it occurs to integer(0).

So: If the question is "How to solve this edge case?" then this is it. If the question is "How to make base R behave like purrr?" (ensuring all resulting elements are of correct type) then I don't think there is a generally accepted pattern.

0
jan-glx On

I find a combination of vapply and mapply/Map easier to use than vapply over indices.

Here, mapply (with SIMPLIFY=FALSE)/Map maps the inputs to a list of return values (turning the multivariate into univariate problem) while vapply (with FUN = identity) only takes care of checking of / providing return value types and appropriately simplifying the output.

Use either directly with either of:

vapply(mapply(my_fun, my_params1, my_params2, SIMPLIFY = FALSE), FUN = identity, FUN.VALUE = my_restype))
vapply(Map(my_fun, my_params1, my_params2), FUN = identity, FUN.VALUE = my_restype))

Or using either of the following shorthands:

vMap <- function(FUN, FUN.VALUE, ...)
  vapply(Map(FUN, ...), FUN = identity, FUN.VALUE = FUN.VALUE)
vmapply <- function(FUN, FUN.VALUE, ..., MoreArgs = NULL)
  vapply(mapply(FUN = FUN, ..., MoreArgs = MoreArgs, SIMPLIFY = FALSE), FUN = identity, FUN.VALUE = FUN.VALUE)

Simple example:

size_of_union <- function(A, B) length(union(A, B))
## normal case:
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
vapply(Map(size_of_union, x , y), FUN = identity, FUN.VALUE = integer(1))
#> [1] 3 1 1

## edge-case:
x <- integer(0)
y <- integer(0)
vapply(Map(size_of_union, x , y), FUN = identity, FUN.VALUE = integer(1))
#> integer(0)

More contrived example:

range_of_intersect <- function(A, B) range(intersect(A, B))

## normal case:
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
vapply(Map(range_of_intersect, x , y), FUN = identity, FUN.VALUE = numeric(2))
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#>      [,1] [,2] [,3]
#> [1,]    3    2  Inf
#> [2,]    3    2 -Inf

## edge-case:
x <- numeric(0)
y <- numeric(0)
vapply(Map(range_of_intersect, x , y), FUN = identity, FUN.VALUE = numeric(2))
#>     
#> [1,]
#> [2,]

from ?Map (emphasis mine):

Map is a simple wrapper to mapply which does not attempt to simplify the result, similar to Common Lisp's mapcar (with arguments being recycled, however). Future versions may allow some control of the result type.

This hints that a multivariate version of vapply is not yet implemented in base R. And given the rare need for it and the minimal extra effort incurred by using the approach presented in this answer, it probably never will be.

Thanks to @moodymudskipper for leading me to this solution!