Suppose I have:
df <- tibble::tibble(a=c(1,0,0),b=c(0,1,0),colname=c(0,0,1))
colname <- "a"
If I do
df %>% dplyr::select(colname)
The column takes precedence, and it returns:
# A tibble: 3 × 1
colname
<dbl>
1 0
2 0
3 1
If I want to evaluate the env-variable colname, I have to use all_of:
> df %>% dplyr::select(all_of(colname))
# A tibble: 3 × 1
a
<dbl>
1 1
2 0
3 0
I don't understand how to resolve the ambiguity in case the column is on LHS thought.
Suppose I want to change all 0 values of the column in the env-variable colname to NA:
df %>% dplyr::mutate(colname=replace(colname, colname==0, NA))
But this will change the third column (colname), not the first name (a).
Yes, I believe the
dplyr::mutate()function needs to know whethercolnamerefers to the existing column named 'colname' in the dataframe, or if it's referring to the variable in the environment.In your current code,
dplyr::mutate()interpretscolnameas the literal column name in the dataframe and therefore, it's replacing values in thecolnamecolumn.You can use the
across()function together withmutate()to specify which columns you want to operate on as @MrFlick mentioned:In this code,
across()applies the~replace(., . == 0, NA)function across the column specified inall_of(colname). The~replace(., . == 0, NA)function is a shorthand for defining a new function that replaces 0 values with NA.The result of this operation would be: