Show count of unique values in datasummary and combine two different tables of descriptive statistics using data

387 views Asked by At

I really like the modelsummary package and i'm trying to produce a single table that mixes descriptive statistics of different types. The first part is easy: I can make basic descriptives of var2 and var3 before. I can't get the second part right, though.

  1. I'd like to get a count of the unique entries of the variable var1, i.e. 26.
  2. I'd like to be able to combine the two into one table.
var1<-rep(LETTERS, 5)
var2<-rnorm(length(var1), mean=50, sd=10)
var3<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3)
library(gr)
library(modelsummary)

#This gets the descriptives of var2 and var3
datasummary(var2+var3~Mean+SD+N, data=df)
#htis returns a long column of the number of entries of each value of var1; I would just like the number 26 here and combine it with the above
datasummary(var1~length, data=df)
2

There are 2 answers

0
Julian On BEST ANSWER

Based on add_row (https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html#add_rows)

new_row <- data.frame('var1',
                       "-",
                       "-",
                       length(unique((var1))))

datasummary(var2+var3~Mean+SD+N, data=df, 
            add_rows = new_row)  
0
Vincent On

Mixing factor and numeric variables in datasummary() is kind of tricky. Here are two options.

The first approach is to create a first table with output="data.frame", and to feed it to the add_rows argument of a second table, inserting “empty” columns as necessary to align the two tables:

library(modelsummary)

var1<-rep(LETTERS[1:5], 5)
var2<-rep(LETTERS[8:12], 5)
var3<-rnorm(length(var1), mean=50, sd=10)
var4<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3, var4)

# function to insert empty columns
empty <- function(...) ""

ar <- datasummary(var1 + var2 ~ empty + empty + N,
                  data = df,
                  output = "data.frame")

datasummary(var3 + var4 ~ Heading("") * empty + Mean + SD + N,
            data = df,
            add_rows = ar)
Mean SD N
var3 52.66 9.35 25
var4 9.21 5.25 25
var1 A 5
B 5
C 5
D 5
E 5
var2 H 5
I 5
J 5
K 5
L 5

The second approach is to use the datasummary_balance template function with ~1 as a formula argument. This is of course less flexible, but it works for simple cases:

datasummary_balance(~ 1, data = df)
Mean Std. Dev.
var3 52.7 9.4
var4 9.2 5.2
N Pct.
var1 A 5 20.0
B 5 20.0
C 5 20.0
D 5 20.0
E 5 20.0
var2 H 5 20.0
I 5 20.0
J 5 20.0
K 5 20.0
L 5 20.0