How to pass '...' argument into an interp() formula within lazyeval

476 views Asked by At

I'm trying to do some parametrised dplyr manipulations. The simplest reproducible example to express the root of the problem is this:

# Data
test <- data.frame(group = rep(1:5, each = 2),
                   value = as.integer(c(NA, NA, 2, 3, 3, 5, 7, 8, 9, 0)))

> test
    group value
1      1    NA
2      1    NA
3      2     2
4      2     3
5      3     3
6      3     5
7      4     7
8      4     8
9      5     9
10     5     0 

# Summarisation example, this is what I'd like to parametrise
# so that I can pass in functions and grouping variables dynamically

test.summary <- test %>% 
                group_by(group) %>% 
                summarise(group.mean = mean(value, na.rm = TRUE))

> test.summary
Source: local data frame [5 x 2]

    group group.mean
    <int>      <dbl>
1     1        NaN
2     2        2.5
3     3        4.0  # Correct results
4     4        7.5
5     5        4.5

This is how far I got alone

# This works fine, but notice there's no 'na.rm = TRUE' passed in

doSummary <- function(d_in = data, func = 'mean', by = 'group') {
# d_in: data in
# func: required function for summarising
# by:   the variable to group by 
# NOTE: the summary is always for the 'value' column in any given dataframe

    # Operations for summarise_
    ops <- interp(~f(value), 
                  .values = list(f = as.name(func),
                                 value = as.name('value')))        
    d_out <- d_in %>% 
             group_by_(by) %>% 
             summarise_(.dots = setNames(ops, func))
}

> doSummary(test)
Source: local data frame [5 x 2]

  group mean(value)
  <int>       <dbl>
1     1          NA
2     2         2.5
3     3         4.0
4     4         7.5
5     5         4.5

Trying with the 'na.rm' parameter

# When I try passing in the 'na.rm = T' parameter it breaks
doSummary.na <- function(d_in = data, func = 'mean', by = 'group') {
    # Doesn't work
    ops <- interp(~do.call(f, args), 
                  .values = list(f = func,
                                 args = list(as.name('value'), na.rm = TRUE)))

    d_out <- d_in %>% 
             group_by_(by) %>% 
             summarise_(.dots = setNames(ops, func))
}

> doSummary.na(test)
Error: object 'value' not found 

Many thanks for your help!

1

There are 1 answers

0
Konrad Rudolph On BEST ANSWER

Your title mentions ... but your question doesn’t. If we don’t need to deal with ..., the answer gets a lot easier, because we don’t need do.call at all, we can call the function directly; simply replace your ops definition with:

ops = interp(~f(value, na.rm = TRUE),
             f = match.fun(func), value = as.name('value'))

Note that I’ve used match.fun here instead of as.name. This is generally a better idea since it works “just like R” for function lookup. As a consequence, you can’t just pass a function name character as an argument but also a function name or an anonymous function:

doSummary.na(test, function (x, ...) mean(x, ...) / sd(x, ...)) # x̂/s?! Whatever.

Speaking of which, your attempt to set the column names also fails; you need to put ops into a list to fix that:

d_in %>%
    group_by_(by) %>%
    summarise_(.dots = setNames(list(ops), func))

… because .dots expects a list of operations (and setNames also expects a vector/list). However, this code once again won’t work if you’re passing a func object in to the function that isn’t a character vector. To make this more robust, use something like this:

fname = if (is.character(func)) {
        func
    } else if (is.name(substitute(func))) {
        as.character(substitute(func))
    } else {
        'func'
    }

d_in %>%
    group_by_(by) %>%
    summarise_(.dots = setNames(list(ops), fname))

Things get more complicated if you actually want to allow passing ..., instead of known arguments, because (as far as I know) there’s simply no direct way of passing ... via interp, and, like you, I cannot get the do.call approach to work.

The ‹lazyeval› package provides the very nice function make_call, which helps us on the way to a solution. The above could also be written as

# Not good. :-(
ops = make_call(as.name(func), list(as.name('value'), na.rm = TRUE))

This works. BUT only when func is passed as a character vector. As explained above, this simply isn’t flexible.

However, make_call simply wraps base R’s as.call and we can use that directly:

ops = as.call(list(match.fun(func), as.name('value'), na.rm = TRUE))

And now we can simply pass ... on:

doSummary = function (d_in = data, func = 'mean', by = 'group', ...) {
    ops = as.call(list(match.fun(func), as.name('value'), ...))

    fname = if (is.character(func)) {
            func
        } else if (is.name(substitute(func))) {
            as.character(substitute(func))
        } else {
            'func'
        }

    d_in %>%
        group_by_(by) %>%
        summarize_(.dots = setNames(list(ops), fname))
}

To be clear: the same could be achieved using interp but I think this would require manually building a formula object from a list, which amounts to doing very much the same as in my solution, and then (redundantly) calling interp on the result.

I generally find that while ‹lazyeval› is incredibly elegant, in some situations base R provides simpler solutions. In particular, interp is a powerful substitute replacement but bquote, a quite underused base R function, already provides many of the same syntactic benefits. The great benefit of ‹lazyeval› objects is that they carry around their evaluation environment, unlike base R expressions. However, this is simply not always needed.