I am aware of using lazyeval inside function in order to refer to column names with dplyr but am stuck. In general, when creating a function that uses dplyr which also references column names from function arguments, what is the most idiomatic way to achieve that? Thanks.
library(lazyeval)
## Create data frame
df0 <- data.frame(x=rnorm(100), y=runif(100))
##########################################
## Sample mean; this way works
##########################################
df0 %>%
filter(!is.na(x)) %>%
summarize(mean=mean(x))
##########################################
## Sample mean via function; does not work
##########################################
dfSummary2 <- function(df, var_y) {
p <- df %>%
filter(!is.na(as.name(var_y))) %>%
summarize(mean=mean(as.name(var_y)))
return(p)
}
dfSummary(df0, "x")
# mean
# 1 NA
# Warning message:
# In mean.default("x") : argument is not numeric or logical: returning NA
##########################################
## Sample mean via function; also does not work
##########################################
dfSummary <- function(df, var_y) {
p <- df %>%
filter(!is.na(var_y)) %>%
summarize(mean=mean(var_y))
return(p)
}
dfSummary(df0, "x")
# mean
# 1 NA
# Warning message:
# In mean.default("x") : argument is not numeric or logical: returning NA
The comment to use
summarize_
andfilter_
is the correct direction if usingdplyr
and more information is available withvignette("nse")
.Although with the given problem, this will provide a function that uses a variable column name without requiring
dplyr