How to select columns sharing common but unknown string elements using dplyr?

Question

How to select columns sharing common but unknown string elements using dplyr?

72 views Asked by user2498193 At 08 December 2024 at 15:52

Suppose I have data where I have count and percent columns based on some prior operations where the column names are data dependent. My question is how can I slect columns containing a common string when I dont' know the string in advance because its data dependent? Some toy data where we do know the column names (but lets pretend we don't!)

library(tidyverse)

mtcars <- mtcars %>% 
    mutate(cnt_something = sample(0:100, nrow(mtcars)),
           cnt_otherthing = sample(0:100, nrow(mtcars)),
           pct_something = paste0( cnt_something, "%"),
           pct_otherthing = paste0( cnt_otherthing, "%"))

So in real data the strings something and otherthing result from previous data dependent steps and actually there could be many columns, but I know there will alway be a pair of columns in the form cnt_ and pct_. My question is therefore how can I select the cnt_***** and pct_**** with matching *****'s for the next manipulation (e.g. paste0()) without knowing the different possible ***** strings.

The desired output is something like this:

                   result_something result_otherthing
Mazda RX4                   49 (49%)          82 (82%)
Mazda RX4 Wag               20 (20%)          72 (72%)
Datsun 710                  37 (37%)          75 (75%)
Hornet 4 Drive              22 (22%)          85 (85%)
Hornet Sportabout           53 (53%)        100 (100%)

Original Q&A

There are 2 answers

Nir Graham On 12 February 2024 at 15:24

Here is one way, using rlang to parse constructed expressions

library(tidyverse)
library(rlang)

mtcars <- mtcars |> select() |>  
  mutate(cnt_something = sample(0:100, nrow(mtcars)),
         cnt_otherthing = sample(0:100, nrow(mtcars)),
         pct_something = paste0( cnt_something, "%"),
         pct_otherthing = paste0( cnt_otherthing, "%"))


funky_funks <- function(data,
         prefix_1,
         prefix_2,
         prefix_res){
nm1 <- names(mtcars)
(nm2 <- nm1[startsWith(nm1,prefix_1)])
(res_names <- str_replace(nm2,fixed(prefix_1),prefix_res))

plist <- map(nm2,\(n1){
    n2 <- str_replace(n1,prefix_1,prefix_2)
    myexpr <- paste0('paste0(',n1,'," (",',n2,',")")')
   rlang::parse_expr(myexpr)
}) |> set_names(res_names)

data |> mutate(!!!plist)
}


funky_funks(mtcars,
            prefix_1 = "cnt_",
            prefix_2 = "pct_",
            prefix_res = "result_")

**LMc** · Accepted Answer · 2024-02-13 16:40:07

You can do a series of pivots to transform this data:

mtcars |>
  rownames_to_column("car") |>
  pivot_longer(matches("^(cnt|pct)"), names_sep = "_", names_to = c(".value", "grp")) |>
  mutate(x = str_glue("{cnt} ({pct})")) |>
  pivot_wider(names_from = grp, names_prefix = "result_", values_from = x, id_cols = car) |>
  column_to_rownames("car")

Note: in the tidyverse row names are discouraged, but since your example had them I used rownames_to_column and column_to_rownames from the tibble package to preserve them. This might not be necessary in your actual data.

TechQA.

How to select columns sharing common but unknown string elements using dplyr?

There are 2 answers

Related Questions in R

Related Questions in SELECT

Related Questions in DPLYR

Popular Questions

Popular Tags

Trending Questions