R: Long to Wide dataframe with concatenation of column values into one

68 views Asked by At

I have a dataframe with the following style.

df = data.frame(Process = rep("Global", 6),
                Category = c("Category_1", "Category_1", "Category_2", "Category_2", "Category_3", "Category_3"),
                Metric = c("Metric_1_1", "Metric_1_2", "Metric_2_1", "Metric_2_2", "Metric_3_1", "Metric_3_2"),
                level_1 = c(0.50, 0.50, 0.50, 0.50, 0.50, 0.50),
                level_2 = c(0.75, 0.50, 0.50, 0.50, 0.50, 0.50))

I would like to convert it to the following dataframe.

df_wide = data.frame(Global_Category_1_Metric_1_1_Level_1 = c(0.5),
                     Global_Category_1_Metric_1_1_Level_2 = c(0.75),
                     Global_Category_1_Metric_1_2_Level_1 = c(0.5),
                     Global_Category_1_Metric_1_2_Level_2 = c(0.5),
                     Global_Category_2_Metric_2_1_Level_1 = c(0.5),
                     Global_Category_2_Metric_2_1_Level_2 = c(0.5),
                     Global_Category_1_Metric_2_2_Level_1 = c(0.5),
                     Global_Category_1_Metric_2_2_Level_2 = c(0.5),
                     Global_Category_3_Metric_3_1_Level_1 = c(0.5),
                     Global_Category_3_Metric_3_1_Level_2 = c(0.5),
                     Global_Category_1_Metric_3_2_Level_1 = c(0.5),
                     Global_Category_1_Metric_3_2_Level_2 = c(0.5))

Basically, I want to concatenate the values of the Process, Category, Metric columns and combine it with the level_1 and level_2 columns to create one column for each "new" metric.

Ideally, I would like for the new name to have colons between the original values, i.e., Global:Category_1:Metric_1_1:Level_1.

I'm not sure how to achieve that with dplyr. However, if necessary I'm not looking for a solution that exclusively uses dplyr.

2

There are 2 answers

0
ThomasIsCoding On BEST ANSWER

Probably you can try

df %>%
    pivot_longer(starts_with("level")) %>%
    pivot_wider(names_from = !value, names_sep = ":")

and with str you will see

> df %>%
+     pivot_longer(starts_with("level")) %>%
+     pivot_wider(names_from = !value, names_sep = ":") %>%
+     str()
tibble [1 × 12] (S3: tbl_df/tbl/data.frame)
 $ Global:Category_1:Metric_1_1:level_1: num 0.5
 $ Global:Category_1:Metric_1_1:level_2: num 0.75
 $ Global:Category_1:Metric_1_2:level_1: num 0.5
 $ Global:Category_1:Metric_1_2:level_2: num 0.5
 $ Global:Category_2:Metric_2_1:level_1: num 0.5
 $ Global:Category_2:Metric_2_1:level_2: num 0.5
 $ Global:Category_2:Metric_2_2:level_1: num 0.5
 $ Global:Category_2:Metric_2_2:level_2: num 0.5
 $ Global:Category_3:Metric_3_1:level_1: num 0.5
 $ Global:Category_3:Metric_3_1:level_2: num 0.5
 $ Global:Category_3:Metric_3_2:level_1: num 0.5
 $ Global:Category_3:Metric_3_2:level_2: num 0.5
1
lotus On

You can do:

library(tidyr)
library(stringr)

df |>
  pivot_wider(
    values_from = starts_with("level"),
    names_from = -starts_with("level"),
    names_glue = "{Process}:{Category}:{Metric}:{str_to_title(.value)}",
    names_vary = "slowest" # Used to match desired column order
  )

Note that while you can use : in column names, it is not a legal name and will generally require wrapping in backticks to use in other operations.