What I have:
library("dplyr")
mtcars %>% count(cyl, gear)
#> cyl gear n
#> 1 4 3 1
#> 2 4 4 8
#> 3 4 5 2
#> 4 6 3 2
#> 5 6 4 4
#> 6 6 5 1
#> 7 8 3 12
#> 8 8 5 2
What I need:
#> variable1 category1 variable2 category2 n
#> 1 cyl 4 gear 3 1
#> 2 cyl 4 gear 4 8
#> 3 cyl 4 gear 5 2
#> 4 cyl 6 gear 3 2
#> 5 cyl 6 gear 4 4
#> 6 cyl 6 gear 5 1
#> 7 cyl 8 gear 3 12
#> 8 cyl 8 gear 5 2
The catch is that the variables (cyl and gear in this case) passed to count() are not fixed, and moreover the number of variables can vary from 1 upwards. They will be passed as arguments to a broader function. Hence I'm looking for a solution that would work nicely with curly-curly or similar. The names of the variables follow no pattern.
I've considered using multiple calls to tidyr::pivot_longer() but I can't work out how this would work with a varying number of variables.
I thought a better approach may be to use dplyr::across() along with dplyr::cur_column(). Something like the pseudo-code below:
var_count <- function(cnt_var) {
mtcars %>%
count(across({{ cnt_var }})) %>% # this works as intended
mutate(across({{ cnt_var }}, \(col) cur_column(), .names = "category")) %>% # an attempt to create the 'variable' names. doesn't work when length(cnt_var) > 1
rename_with(.cols = {{ cnt_var }}, .fn = "category") # a thought about how to create the 'category' columns
}
var_count(cnt_var = c(cyl)) # this ideally should produce one name-value pair: variable1 and category1
var_count(cnt_var = c(cyl, gear)) # this should produce two pairs: variable1, category1, variable2, category2
var_count(cnt_var = c(cyl, gear, vs)) # this should produce three pairs, etc
I'd ideally like a tidyverse solution, but all suggestions are most welcome. Thanks folks!
As you already guessed I would go for
across,cur_columnand some renaming. As a first step I create thecategoryandvariablecolumns, then get rid of the original columns. Afterwards I userename_withandstring::str_replaceto replace the column names suffixes with numeric suffixes:UPDATE And thanks to the comment by @lotus we can make the function more concise using
.keep="unused"and.before=1to get rid of theselectand by doing the renaming inacross()to get rid of therename_with: