% dplyr::select(colname) The column takes pre" /> % dplyr::select(colname) The column takes pre" /> % dplyr::select(colname) The column takes pre"/>

dplyr ambiguity with external vector when column on LHS

56 views Asked by At

Suppose I have:

df <- tibble::tibble(a=c(1,0,0),b=c(0,1,0),colname=c(0,0,1))
colname <- "a"

If I do

df %>% dplyr::select(colname)

The column takes precedence, and it returns:

# A tibble: 3 × 1
  colname
    <dbl>
1       0
2       0
3       1

If I want to evaluate the env-variable colname, I have to use all_of:

> df %>% dplyr::select(all_of(colname))
# A tibble: 3 × 1
      a
  <dbl>
1     1
2     0
3     0

I don't understand how to resolve the ambiguity in case the column is on LHS thought. Suppose I want to change all 0 values of the column in the env-variable colname to NA:

df %>% dplyr::mutate(colname=replace(colname, colname==0, NA))

But this will change the third column (colname), not the first name (a).

1

There are 1 answers

0
elsa On

Yes, I believe the dplyr::mutate() function needs to know whether colname refers to the existing column named 'colname' in the dataframe, or if it's referring to the variable in the environment.

In your current code, dplyr::mutate() interprets colname as the literal column name in the dataframe and therefore, it's replacing values in the colname column.

You can use the across() function together with mutate() to specify which columns you want to operate on as @MrFlick mentioned:

df <- tibble::tibble(a=c(1,0,0),b=c(0,1,0),colname=c(0,0,1))
colname <- "a"

df %>% mutate(across(all_of(colname), ~replace(., . == 0, NA)))

In this code, across() applies the ~replace(., . == 0, NA) function across the column specified in all_of(colname). The ~replace(., . == 0, NA) function is a shorthand for defining a new function that replaces 0 values with NA.

The result of this operation would be:

# A tibble: 3 x 3
      a     b colname
  <dbl> <dbl>   <dbl>
1     1     0       0
2    NA     1       0
3    NA     0       1