Make a function that groups new variables

44 views Asked by At

I have this kind of dataset;

dt <- data.table(ID = c(1, 2, 3, 4),
                 q1= c(1, NA, NA, 1), 
                 q2= c(1, 3, 2, NA), 
                 q3= c(2, 1, 4, 4))

I need to make new variables based on q1, q2, q3, but i want to group the values; so its should be that all with value 1, 2 = YES, all with value 3 = NO, all with value 4 = IDK

So final dataset should be

ID q1 q2 q3 q1_cat q2_cat q3_cat
1  1  1  2   YES     YES     YES     
2  NA 3  1   NA      NO      YES 
3  NA 2  4  NA       NO      IDK
4  1 NA  4  YES      NA      IDK


        
2

There are 2 answers

0
dufei On
library(tidyverse)

df <- tibble(
  ID = c(1, 2, 3, 4),
  q1 = c(1, NA, NA, 1),
  q2 = c(1, 3, 2, NA),
  q3 = c(2, 1, 4, 4)
)

df |> 
  mutate(across(
    starts_with("q"),
    \(x) case_match(x, c(1, 2) ~ "yes", 3 ~ "no"),
    .names = "{.col}_cat"
  ))
#> # A tibble: 4 × 7
#>      ID    q1    q2    q3 q1_cat q2_cat q3_cat
#>   <dbl> <dbl> <dbl> <dbl> <chr>  <chr>  <chr> 
#> 1     1     1     1     2 yes    yes    yes   
#> 2     2    NA     3     1 <NA>   no     yes   
#> 3     3    NA     2     4 <NA>   yes    <NA>  
#> 4     4     1    NA     4 yes    <NA>   <NA>

Created on 2023-03-23 with reprex v2.0.2

4
Allan Cameron On

In base R

setDF(dt)
dt[,5:7] <- setNames(lapply(dt[,-1], function(x) c('YES', 'YES', 'NO', 'IDK')[x]),
                     paste0(names(dt[-1]), '_cat'))

result:

dt
#>   ID q1 q2 q3 q1_cat q2_cat q3_cat
#> 1  1  1  1  2    YES    YES    YES
#> 2  2 NA  3  1   <NA>     NO    YES
#> 3  3 NA  2  4   <NA>    YES    IDK
#> 4  4  1 NA  4    YES   <NA>    IDK