Use tbl_summary to get the means of categorical data

2.8k views Asked by At

I would like to produce the both the means and the frequencies for a subset of my categorical variables.

mtcars2 <- mtcars %>% mutate(across(matches('cyl|gear|carb'), as.factor))

I know that I can use this to get a output for my categorical and my continuous separately.

mtcars_out <- tbl_summary(mtcars2, 
                          statistic = list(all_numeric() ~ "{mean} ({sd})",
                                           all_categorical() ~ "{n} / {N} ({p}%)")) %>% as_tibble()

Since mtacrs$cyl already has "level" associated with it, I want to use mtcars2 as it is and generate the mean for that variable. Something like this... but tbl_summary does not like this, since it is a categorical variable.

mtcars_out <- tbl_summary(mtcars2, 
                          statistic = list(all_numeric() ~ "{mean} ({sd})",
                                           "cyl"~"{mean} ({sd})")) %>% as_tibble()

Error: Problem with `mutate()` input `tbl_stats`.
x There was an error assembling the summary statistics for 'cyl'
  with summary type 'categorical'.

There are 2 common sources for this error.
1. You have requested summary statistics meant for continuous
   variables for a variable being as summarized as categorical.
   To change the summary type to continuous, add the argument
  `type = list(cyl ~ 'continuous')`
2. One of the functions or statistics from the `statistic=` argument is not valid.
i Input `tbl_stats` is `pmap(...)`.

I tried specifying the type within the call, but that doesn't work either.

mtcars_out <- tbl_summary(mtcars2, 
                          type = list("cyl"~"continuous"),
                          statistic = list(all_numeric() ~ "{mean} ({sd})",
                                           all_categorical() ~ "{n} / {N} ({p}%)")) %>% as_tibble()



 Error: Problem with `mutate()` input `summary_type`.
x Column 'cyl' is class "factor" and cannot be summarized as a continuous variable.
i Input `summary_type` is `assign_summary_type(...)`.

My actual dataset has 500 variables, and have already specified the class for each variable so I don't want to have to change the class type of my original data set. I want to specify it within the tbl_summary call.

Any help is greatly appreciated!!

1

There are 1 answers

0
Daniel D. Sjoberg On

You've made cyl a factor and R will not allow you to take the mean of a factor variable.

I think the easiest thing for you to do is to have a numeric version of the variable and the factor version. From there you can just summarize both variables. From there, you can remove the extra header row (for the factor version of the variable).

library(gtsummary)
library(tidyverse)

tbl <- 
  mtcars %>%
  select(cyl) %>%
  mutate(fct_cyl = factor(cyl)) %>%
  tbl_summary(
    type = where(is.numeric) ~ "continuous",
    statistic = where(is.numeric) ~ "{mean} ({sd})",
    label = cyl ~ "No. Cylinders"
  ) 

# remove extra header row for factor variables
tbl$table_body <-
  tbl$table_body %>%
  filter(!(startsWith(variable, "fct_") & row_type == "label"))

# print table
tbl

enter image description here