How to write a function with same interface as dplyr::filter but which is doing something different

82 views Asked by At

I would like to implement a function which has the same interface as the filter method in dplyr but instead of removing the rows not matching to a condition would, for instance, return an array with an indicator variable, or attach such column to the returned tibble?

I would find it very useful since it would allow me to compute summaries of some columns after and before filtering as well as summaries of the rows which would have been removed on a single tibble.

I find the dplyr::filter interface very convenient and therefore would like to emulate it.

2

There are 2 answers

0
witek On BEST ANSWER

You need to quo and !! (or UQ()) . See following example:

df <- tibble(
 g1 = c(1, 1, 2, 2, 2),
 g2 = c(1, 2, 1, 2, 1),
 a = sample(5), 
 b = sample(5)

)

my_summarise <- function(df, group_by) {
   quo_group_by <- quo(group_by)
   print(quo_group_by)

   df %>%
      group_by(!!quo_group_by) %>%
      summarise(a = mean(a))
}


my_summarise(df, g1)

For more examples and discussion see http://dplyr.tidyverse.org/articles/programming.html

0
CPak On

I think group_by will help you here

You might normally filter then summarise like so

library(dplyr)
mtcars %>%
  filter(cyl==4) %>%
  summarise(mean=mean(gear))

      # mean
# 1 4.090909

You can group_by, summarise, then filter

mtcars %>%
  group_by(cyl) %>%
  summarise(mean=mean(gear))
  # optional filter here

# # A tibble: 3 x 2
    # cyl     mean
  # <dbl>    <dbl>
# 1     4 4.090909
# 2     6 3.857143
# 3     8 3.285714

You can group by conditionals as well, like so

mtcars %>%
  group_by(cyl > 4) %>%
  summarise(mean=mean(gear))

# # A tibble: 2 x 2
  # `cyl > 4`     mean
      # <lgl>    <dbl>
# 1     FALSE 4.090909
# 2      TRUE 3.476190