Coalesce inside an across statement using column name and a prefix

86 views Asked by At

Again I feel like I should be able to do something to seems not quite complicated but I can't figure it out.

I have a dataframe df with variables a1, v_a1, a2, v_a2 and so on. All I want to do is mutate my dataframe so that a1 = coalesce(a1, v_a1)and a2 = coalesce(a2, v_a2)

I tried many different thing but keep it doesn't quite work yet.

Here is a toy example and things I tried :

df <- data.frame(
  a1 = c(1,NA,2,NA),
  v_a1 = c(NA,0,NA,5),
  a2 = c(4,NA,8,NA),
  v_a2 = c(NA,7,NA,NA),
  a11 = c("just here","to force","the use of paste0","inside the coalesce"))
df %>% mutate(
  across(c(a1,a2),
         ~ coalesce(!!!select(matches(paste0(cur_column(),"$"))))))

This throws an error about cur_column() which "Must only be used inside across()". I had the same error when first declaring a function with a df and variable parameters to wrap the coalesce but the it seemed not to appreciate the .data[[cur_column()]] inside the across

df %>% mutate(
  across(c(a1,a2),
         ~ coalesce(across(all_of(c(cur_column(), paste0("v_", cur_column())))))))

This one doesn't have an error but the output is really weird turning the tibble into string and making some pasting. It puzzles me there.

This topic is veeeery close How to coalesce chunks of columns based on a prefix pattern but the coalesce uses a raw string inside the contains which I don't have (hence the need for cur_column and moreover it uses a for loop while I think I should be able to do it within an across...but maybe not !

Any help or piece of advice much appreciated !

2

There are 2 answers

3
jkatam On BEST ANSWER

Please try the below code

library(tidyverse)

df2 <- purrr::map2_dfc(df[,str_detect(names(df),'\\ba1\\b|\\ba2\\b')], df[,str_detect(names(df),'v_')], ~ coalesce(.x,.y) )

df3 <- cbind(df2,df[,str_detect(names(df),'v_|a11')])


  a1 a2 v_a1 v_a2                 a11
1  1  4   NA   NA           just here
2  0  7    0    7            to force
3  2  8   NA   NA   the use of paste0
4  5 NA    5   NA inside the coalesce

with across please check the code below

df %>% mutate(across(c(a1,a2), ~ coalesce(.x, get(paste('v',cur_column(),sep = '_'))) ))


  a1 v_a1 a2 v_a2                 a11
1  1   NA  4   NA           just here
2  0    0  7    7            to force
3  2   NA  8   NA   the use of paste0
4  5    5 NA   NA inside the coalesce

0
elka On

Well, don't know why and how I did not find this yesterday, espacially as I tried with a custom coalesce function, but anyway, found something which works and is as easy as this :

library(dplyr)

df <- data.frame(
  a1 = c(1,NA,2,NA),
  v_a1 = c(NA,0,NA,5),
  a2 = c(4,NA,8,NA),
  v_a2 = c(NA,7,NA,NA),
  a11 = c("just","another","column","character"))

my_coalesce <- function(df,x){
  vx <- paste0("v_", x)
  return(coalesce(df[[x]], df[[vx]]))
}

df %>% mutate(across(c(a1, a2),
                     ~ my_coalesce(df=.data, x=cur_column())))

Although you need the function first to avoid the "Caused by error in cur_column(): ! Must only be used inside across()." issue