How to select columns sharing common but unknown string elements using dplyr?

76 views Asked by At

Suppose I have data where I have count and percent columns based on some prior operations where the column names are data dependent. My question is how can I slect columns containing a common string when I dont' know the string in advance because its data dependent? Some toy data where we do know the column names (but lets pretend we don't!)

library(tidyverse)

mtcars <- mtcars %>% 
    mutate(cnt_something = sample(0:100, nrow(mtcars)),
           cnt_otherthing = sample(0:100, nrow(mtcars)),
           pct_something = paste0( cnt_something, "%"),
           pct_otherthing = paste0( cnt_otherthing, "%"))

So in real data the strings something and otherthing result from previous data dependent steps and actually there could be many columns, but I know there will alway be a pair of columns in the form cnt_ and pct_. My question is therefore how can I select the cnt_***** and pct_**** with matching *****'s for the next manipulation (e.g. paste0()) without knowing the different possible ***** strings.

The desired output is something like this:

                   result_something result_otherthing
Mazda RX4                   49 (49%)          82 (82%)
Mazda RX4 Wag               20 (20%)          72 (72%)
Datsun 710                  37 (37%)          75 (75%)
Hornet 4 Drive              22 (22%)          85 (85%)
Hornet Sportabout           53 (53%)        100 (100%)
2

There are 2 answers

2
LMc On BEST ANSWER

You can do a series of pivots to transform this data:

mtcars |>
  rownames_to_column("car") |>
  pivot_longer(matches("^(cnt|pct)"), names_sep = "_", names_to = c(".value", "grp")) |>
  mutate(x = str_glue("{cnt} ({pct})")) |>
  pivot_wider(names_from = grp, names_prefix = "result_", values_from = x, id_cols = car) |>
  column_to_rownames("car")

Note: in the tidyverse row names are discouraged, but since your example had them I used rownames_to_column and column_to_rownames from the tibble package to preserve them. This might not be necessary in your actual data.

3
Nir Graham On

Here is one way, using rlang to parse constructed expressions

library(tidyverse)
library(rlang)

mtcars <- mtcars |> select() |>  
  mutate(cnt_something = sample(0:100, nrow(mtcars)),
         cnt_otherthing = sample(0:100, nrow(mtcars)),
         pct_something = paste0( cnt_something, "%"),
         pct_otherthing = paste0( cnt_otherthing, "%"))


funky_funks <- function(data,
         prefix_1,
         prefix_2,
         prefix_res){
nm1 <- names(mtcars)
(nm2 <- nm1[startsWith(nm1,prefix_1)])
(res_names <- str_replace(nm2,fixed(prefix_1),prefix_res))

plist <- map(nm2,\(n1){
    n2 <- str_replace(n1,prefix_1,prefix_2)
    myexpr <- paste0('paste0(',n1,'," (",',n2,',")")')
   rlang::parse_expr(myexpr)
}) |> set_names(res_names)

data |> mutate(!!!plist)
}


funky_funks(mtcars,
            prefix_1 = "cnt_",
            prefix_2 = "pct_",
            prefix_res = "result_")