Conditional concatenation of string variables in tidyverse

840 views Asked by At

I am trying to conditionally concatenate string variables using tidyverse.

Here is the toy data

df <- tibble(id = paste0("id_", 1:4),
             outcome = rep(x = c("simon",
                                 "garfunkel"),
                           times = 2),
             worth = rep(x = c("awesome",
                               "disposable"),
                         times = 2))

df

#   id    outcome   worth     
#   <chr> <chr>     <chr>     
# 1 id_1  simon     awesome   
# 2 id_2  garfunkel disposable
# 3 id_3  simon     awesome   
# 4 id_4  garfunkel disposable

I can use unite() from tidyr to combine the id column and 'worth' column like so

df %>%
  unite("id", c(id, worth))

#   id              outcome  
#   <chr>           <chr>    
# 1 id_1_awesome    simon    
# 2 id_2_disposable garfunkel
# 3 id_3_awesome    simon    
# 4 id_4_disposable garfunkel

But there are a few problems with this, some problems with the output and some problems with the way I generated it.

First, I would like to retain the original column whereas unite() simply concatenates the two columns. I tried unite within mutate but this generated an error.

Second, and most important, rather than simply concatenating a column I would like to make the new cocantenated id column a combination of the id column and the worth column but conditional on the outcome column. I tried to do this using case_when() within mutate() but got confused where to put the paste0() function and/or whether unite() could be used inside case_when().

Third, and related to the second point, I need to concatenate only a part of the worth column into the id column. ideally using a regex substitution, capturing only the first x letters of the worth column

Basically I need the new dataset to look like the dataframe below, but using conditional and string-concantenation mechanics

tibble(id = paste0(paste0("id_", 1:4), 
                   rep(c("_awes", "_disp"))),
       outcome = rep(x = c("simon",
                           "garfunkel"),
                     times = 2),
       worth = rep(x = c("awesome",
                         "disposable"),
                   times = 2))

#   id          outcome   worth     
#   <chr>       <chr>     <chr>     
# 1 id_1_awes   simon     awesome   
# 2 id_2_disp   garfunkel disposable
# 3 id_3_awes   simon     awesome   
# 4 id_4_disp   garfunkel disposable

Any help much appreciated.

(p.s. apologies if you think Garfunkel was also awesome)

2

There are 2 answers

0
Onyambu On BEST ANSWER
df %>% 
   mutate(worth1 = substr(worth, 1, 4)) %>%
   unite(id, id, worth1)

# A tibble: 4 x 3
  id        outcome   worth     
  <chr>     <chr>     <chr>     
1 id_1_awes simon     awesome   
2 id_2_disp garfunkel disposable
3 id_3_awes simon     awesome   
4 id_4_disp garfunkel disposable
0
llewmills On

I put up a very confusing example, which, as @camille pointed out, had some redundancy in that the column I wanted to condition on followed an identical pattern to the column I wanted to extract, hence removing the need for conditioning at all. All I can say is mea culpa. However, since people have already provided solutions based on the original, confusing dataset I will leave the example as-is. Based on their answers the following is what I was looking for

df %>%
  mutate(newid = case_when(outcome == "simon" ~ paste(id, substr(worth, 1, 4), sep = "_"),
                           outcome == "garfunkel" ~ paste(id, substr(worth, 1, 4), sep = "_")))

#   id    outcome   worth      newid    
#   <chr> <chr>     <chr>      <chr>    
# 1 id_1  simon     awesome    id_1_awes
# 2 id_2  garfunkel disposable id_2_disp
# 3 id_3  simon     awesome    id_3_awes
# 4 id_4  garfunkel disposable id_4_disp

This solution conditions on the outcome variable but extracts the first four characters of the worth variable and combines that with the `id variable. Thanks to the responders for helping me with this.