Is there a function in dplyr/forcats to display count and percentages from a dataframe of dichotomous variables?

Question

Is there a function in dplyr/forcats to display count and percentages from a dataframe of dichotomous variables?

444 views Asked by Julius Heemelaar At 21 October 2020 at 09:32

I frequently get stuck when I want to summarise categorial variables in my dataset. My dataset contains a dichotomous variables (yes/no) per patient. In the below example set , "A-C" are risk factors that the person does or does not have.

A <- c("yes", "no", "yes", "no", "yes")
B <- c("no", "no", "yes", "yes", "no")
C <- c("yes", "no", "yes", "no", "yes")

df <- data.frame(A, B, C)

what I am trying to do is to summarise all variables to factor level counts and percentages - with one line of code. I tried using apply, forcats, dplyr but can't get it right. Can anyone help me :)

I am hoping to get:

A : Yes 3 | %

No 2 | %

B: ..

C..

The ultimate goal is make a big summary table of baseline characteristics of a study population with both continous and categorical variables. Probably will try to use CBCgrps or tableone.

Thank you!

Original Q&A

There are 3 answers

Edo On 21 October 2020 at 09:49

With Base R there is a pretty simple solution:

lapply(df, function(x){
 
 tb <- table(x)
 as.data.frame(cbind(n = tb, perc = tb / sum(tb)))
 
})
#> $A
#>     n perc
#> no  2  0.4
#> yes 3  0.6
#> 
#> $B
#>     n perc
#> no  3  0.6
#> yes 2  0.4
#> 
#> $C
#>     n perc
#> no  2  0.4
#> yes 3  0.6

Lsax On 21 October 2020 at 10:01

I wonder if this tidyverse solution suits you. Pivot to long format, group by "groups" and "answer". Summarise counts cases within each combination of "group" and "answer", "answer" is then peeled off and percentage calculated by groups A,B and C. Ungrouping peels of "answers" so we can calculate percentage overall.

library(tidyverse)
A <- c("yes", "no", "yes", "no", "yes")
B <- c("no", "no", "yes", "yes", "no")
C <- c("yes", "no", "yes", "no", "yes")

df <- data.frame(A, B, C)
df %>%
  pivot_longer(cols = everything(), names_to = "group", values_to = "answer") %>%
  group_by(group, answer) %>%
  summarise(n = n()) %>%
  mutate(percent_by_group = scales::percent(n / sum(n))) %>% 
  ungroup() %>% 
  mutate(percent_overall=scales::percent(n / sum(n)))

This is the result

 # A tibble: 6 x 5
  group answer     n percent_by_group percent_overall
  <chr> <chr>  <int> <chr>            <chr>          
1 A     no         2 40%              13.3%          
2 A     yes        3 60%              20.0%          
3 B     no         3 60%              20.0%          
4 B     yes        2 40%              13.3%          
5 C     no         2 40%              13.3%          
6 C     yes        3 60%              20.0%

**lotus** · Accepted Answer · 2020-10-21T10:20:02+00:00

You can use forcats::fct_count():

library(purrr)
library(forcats)

map_df(df, fct_count, prop = TRUE, .id = "var")

# A tibble: 6 x 4
  var   f         n     p
  <chr> <fct> <int> <dbl>
1 A     no        2   0.4
2 A     yes       3   0.6
3 B     no        3   0.6
4 B     yes       2   0.4
5 C     no        2   0.4
6 C     yes       3   0.6

TechQA.

Is there a function in dplyr/forcats to display count and percentages from a dataframe of dichotomous variables?

There are 3 answers

Related Questions in R

Related Questions in DPLYR

Related Questions in CATEGORICAL-DATA

Related Questions in BASELINE

Popular Questions

Popular Tags

Trending Questions