I'm trying to do some parametrised dplyr
manipulations. The simplest reproducible example to express the root of the problem is this:
# Data
test <- data.frame(group = rep(1:5, each = 2),
value = as.integer(c(NA, NA, 2, 3, 3, 5, 7, 8, 9, 0)))
> test
group value
1 1 NA
2 1 NA
3 2 2
4 2 3
5 3 3
6 3 5
7 4 7
8 4 8
9 5 9
10 5 0
# Summarisation example, this is what I'd like to parametrise
# so that I can pass in functions and grouping variables dynamically
test.summary <- test %>%
group_by(group) %>%
summarise(group.mean = mean(value, na.rm = TRUE))
> test.summary
Source: local data frame [5 x 2]
group group.mean
<int> <dbl>
1 1 NaN
2 2 2.5
3 3 4.0 # Correct results
4 4 7.5
5 5 4.5
This is how far I got alone
# This works fine, but notice there's no 'na.rm = TRUE' passed in
doSummary <- function(d_in = data, func = 'mean', by = 'group') {
# d_in: data in
# func: required function for summarising
# by: the variable to group by
# NOTE: the summary is always for the 'value' column in any given dataframe
# Operations for summarise_
ops <- interp(~f(value),
.values = list(f = as.name(func),
value = as.name('value')))
d_out <- d_in %>%
group_by_(by) %>%
summarise_(.dots = setNames(ops, func))
}
> doSummary(test)
Source: local data frame [5 x 2]
group mean(value)
<int> <dbl>
1 1 NA
2 2 2.5
3 3 4.0
4 4 7.5
5 5 4.5
Trying with the 'na.rm' parameter
# When I try passing in the 'na.rm = T' parameter it breaks
doSummary.na <- function(d_in = data, func = 'mean', by = 'group') {
# Doesn't work
ops <- interp(~do.call(f, args),
.values = list(f = func,
args = list(as.name('value'), na.rm = TRUE)))
d_out <- d_in %>%
group_by_(by) %>%
summarise_(.dots = setNames(ops, func))
}
> doSummary.na(test)
Error: object 'value' not found
Many thanks for your help!
Your title mentions
...
but your question doesn’t. If we don’t need to deal with...
, the answer gets a lot easier, because we don’t needdo.call
at all, we can call the function directly; simply replace yourops
definition with:Note that I’ve used
match.fun
here instead ofas.name
. This is generally a better idea since it works “just like R” for function lookup. As a consequence, you can’t just pass a function name character as an argument but also a function name or an anonymous function:Speaking of which, your attempt to set the column names also fails; you need to put
ops
into a list to fix that:… because
.dots
expects a list of operations (andsetNames
also expects a vector/list). However, this code once again won’t work if you’re passing afunc
object in to the function that isn’t a character vector. To make this more robust, use something like this:Things get more complicated if you actually want to allow passing
...
, instead of known arguments, because (as far as I know) there’s simply no direct way of passing...
viainterp
, and, like you, I cannot get thedo.call
approach to work.The ‹lazyeval› package provides the very nice function
make_call
, which helps us on the way to a solution. The above could also be written asThis works. BUT only when
func
is passed as a character vector. As explained above, this simply isn’t flexible.However,
make_call
simply wraps base R’sas.call
and we can use that directly:And now we can simply pass
...
on:To be clear: the same could be achieved using
interp
but I think this would require manually building aformula
object from a list, which amounts to doing very much the same as in my solution, and then (redundantly) callinginterp
on the result.I generally find that while ‹lazyeval› is incredibly elegant, in some situations base R provides simpler solutions. In particular,
interp
is a powerfulsubstitute
replacement butbquote
, a quite underused base R function, already provides many of the same syntactic benefits. The great benefit of ‹lazyeval› objects is that they carry around their evaluation environment, unlike base R expressions. However, this is simply not always needed.