String based filtering in dplyr - NSE

2k views Asked by At

I'd like to use dplyr's new NSE notations (version >= 0.6) for a dynamic filter on my data. Let's say I have the following dummy dataset:

df = data_frame(x = 1:10, y = 10:1, z = 10 * runif(10))

If now I want to filter column tofilter = "x" for values greater than 5 I know I can do:

df %>% 
  filter((!!rlang::sym(tofilter)) >= 5)

Question 1

What if I want to dynamically change the operator of the filtering too (let's say I have a Shiny App in which the user can dynamically selectInput if to filter the data for values greater than 5, equal to 5 or lower than 5?

What I'd like to do is something on the line of:

op = ">="
val = 5
filt_expr = paste("x", op, val)
df %>% 
  filter(filt_expr)

Obviously, this does not work and I have played a bit with the rlang quosore/symbols, etc but didn't quite find the right way to "quote" my inputs.

Question 2

Bonus question is, what if I want to apply multiple filters? Do I need to loop or I can create a list of filtering expressions and apply them all in one go?

An example of this is a Shiny App where the user can type multiple conditions he/she wants to apply to the data so that we have a dynamically changing list of the format:

filt_expr_list = list("x >= 5", "y <= 10", "z >= 2")

and we want to dynamically apply them all, so that the output is equivalent to:

df %>%
  filter(x >= 5, y <= 10, z >= 2)

I guess this is in a certain sense a subset of question 1 since when I know how to correctly quote the arguments I think I could do something like:

filt_expr = paste0(unlist(filt_expr_list), collapse = ", ")
df %>%
  filter(filt_expr)

but would be nice to see if there is any nicer cleaner way

2

There are 2 answers

3
Lionel Henry On BEST ANSWER

What if I want to dynamically change the operator of the filtering too

You can do it with tidy eval by unquoting a symbol representing the operator (note that I use expr() to illustrate the result of the unquoting):

lhs <- "foo"

# Storing the symbol `<` in `op`
op <- quote(`<`)

expr(`!!`(op)(!!sym(lhs), 5))
#> foo < 5

However it is cleaner to do it outside tidy eval with regular R code. Unquoting is only necessary when the symbol you unquote represents a column from the data frame, i.e. something that's not in the context. Here you can just store the operator in a variable and then call that variable in your filtering expression:

# Storing the function `<` in `op`
op <- `<`

expr(op(!!sym(lhs), 5))
#> op(foo, 5)

what if I want to apply multiple filters?

You save the expressions in a list and then you splice them in a call with !!!:

filters <- list(
  quote(x >= 5),
  quote(y <= 10),
  quote(z >= 2)
)

expr(df %>% filter(!!!filters))
#> df %>% filter(x >= 5, y <= 10, z >= 2)`

Note: I said above that it is not necessary to unquote variable from the context, but it is still often a good idea to do so if you're writing a function that has the data frame as input. Since the data frame is variable, you don't know in advance what columns it contains. The columns will always have precedence over the objects you have defined in the environment. In the case here, this is not an issue because we are talking about a function and R will keep looking for a function if it finds a similarly named object in the data frame.

1
Vishal Katti On

You can actually do this:

    df = data_frame(x = 1:10, y = 10:1, z = 10 * runif(10))
    op = ">="
    val = 5
    filt_expr = paste("x", op, val)

    df %>% filter(eval(parse(filt_expr)))