How does gtsummary produce confidence intervals and standard error statistics for glm models? (Code Examples Included)

Question

How does gtsummary produce confidence intervals and standard error statistics for glm models? (Code Examples Included)

224 views Asked by Triage At 14 April 2022 at 22:08

Want to preface this with heaps of appreciate for gtsummary -- wonderful package.

After using tidymodels, GLM, and gtsummary for a while, I've been trying to understand gtsummary's computations for GLM model performance and confidence intervals.

Can the anyone and/or Dr. Sjoberg + gtsummary team explain the following questions 1 & 2

Question 1: Why are standard errors different when using broom::tidy() vs. parameters::model_parameters() functions to extract model residual data? (Bolded text in print outs shows differences)

library(gtsummary)
library(parameters)
library(rsample)
library(broom)


trial2 <- trial %>% select(age, grade, response, trt) %>%
  drop_na()


model_trial2 <- glm(response ~ age + grade + trt, 
                    data = trial2, 
                    family=binomial(link="logit"))

broom::tidy(model_trial2, exponentiate = TRUE)

# # A tibble: 5 × 5
# term            estimate  std.error statistic  p.value
# <chr>           <dbl>     <dbl>     <dbl>     <dbl>
# 1 (Intercept)    0.184    **0.630**    -2.69      0.00715
# 2 age            1.02     0.0114    1.67      0.0952 
# 3 gradeII        0.852    **0.395**    -0.406     0.685  
# 4 gradeIII       1.01     0.385     0.0199    0.984  
# 5 trtDrug B      1.13     **0.321**     0.387     0.699  

preadmission_model_parameters <- model_trial2 %>% parameters::model_parameters(exponentiate = TRUE) 


preadmission_model_parameters
# Parameter    | Odds Ratio |   SE |       95% CI |     z |     p
# ---------------------------------------------------------------
# (Intercept)  |       0.18 | **0.12** | [0.05, 0.61] | -2.69 | 0.007
# age          |       1.02 | 0.01 | [1.00, 1.04] |  1.67 | 0.095
# grade [II]   |       0.85 | **0.34** | [0.39, 1.85] | -0.41 | 0.685
# grade [III]  |       1.01 | 0.39 | [0.47, 2.15] |  0.02 | 0.984
# trt [Drug B] |       1.13 | **0.36** | [0.60, 2.13] |  0.39 | 0.699

Question 2: (a) What method does gtsummary use to produce confidence intervals? (b) can the user define (stratified or unstratified) k-fold cross-validation or bootstraps to produce confidence intervals?

(Bolded differences in confidence intervals for the reg_intervals() bootstrapped confidence intervals and the unknown method gtsummary tbl_regression() confidence intervals.)

library(gtsummary)
library(parameters)
library(rsample)
library(broom)

trial2 <- trial %>% select(age, grade, response, trt) %>%
  drop_na()

bootstraps(trial2, times = 10)

trial_bootrapped_confidence_intervals <- reg_intervals(response ~ age + grade + trt, 
                                                  data = trial2,
                                                  model_fn = "glm",
                                                  keep_reps = TRUE, 
                                                  family=binomial(link="logit"))

trial_bootrapped_confidence_intervals_exp <- trial_bootrapped_confidence_intervals %>%
  select(term:.alpha) %>%
  mutate(across(.cols = c(.lower, .estimate, .upper), ~exp(.))) %>%
  as_tibble()

trial_bootrapped_confidence_intervals_exp
# # A tibble: 4 × 5
# term       .lower .estimate .upper .alpha
# <chr>       <dbl>     <dbl>  <dbl>  <dbl>
# 1 age        0.997     1.02    1.04   0.05
# 2 gradeII    **0.400**     0.846   **1.86**   0.05
# 3 gradeIII   0.473     1.01    2.10   0.05
# 4 trtDrug B  0.600     1.14    2.22   0.05

model_trial2_tbl_regression <- 
  glm(response ~ age + grade + trt, 
      data = trial2,
      family=binomial(link="logit")) %>%
  tbl_regression(
    exponentiate = T
  ) %>%
  add_global_p() 

model_trial2_tbl_regression_metrics <- model_trial2_tbl_regression$table_body %>%
  select(
    label, 
    estimate, 
    std.error, 
    statistic, 
    conf.low , 
    conf.high, 
    p.value
  )

model_trial2_tbl_regression_metrics
# A tibble: 8 × 7
# label                  estimate std.error statistic conf.low conf.high p.value
# <chr>                     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>   <dbl>
# 1 Age                       1.02     0.0114    1.67      0.997      1.04  0.0909
# 2 Grade                    NA       NA        NA        NA         NA     0.894 
# 3 I                        NA       NA        NA        NA         NA    NA     
# 4 II                        0.852    0.395    -0.406     **0.389**      **1.85** NA     
# 5 III                       1.01     0.385     0.0199    0.472      2.15 NA     
# 6 Chemotherapy Treatment   NA       NA        NA        NA         NA     0.699 
# 7 Drug A                   NA       NA        NA        NA         NA    NA     
# 8 Drug B                    1.13     0.321     0.387     0.603      2.13 NA

Original Q&A

There are 1 answers

**Isaiah** · Answer 1 · 2022-12-23T10:52:53+00:00

The issue is with the exponentiation (applied as the family is binomial). Broom::tidy does not exponentiate the standard errors but parameters does. You can see this with broom::tidy(model_trial2, exponentiate = TRUE) and broom::tidy(model_trial2, exponentiate = FALSE), which return the same standard errors. parameters::model_parameters(exponentiate = TRUE) and parameters::model_parameters(exponentiate = FALSE) return different standard errors. When exponentiate is FALSE for parameters, the standard errors match. This is discussed in Check exponentiate behavior in tidy methods #422

To create a custom tidier for gtsummary, see FAQ + Gallery

TechQA.

How does gtsummary produce confidence intervals and standard error statistics for glm models? (Code Examples Included)

There are 1 answers

Related Questions in R

Related Questions in CONFIDENCE-INTERVAL

Related Questions in GTSUMMARY

Related Questions in BROOM

Related Questions in RSAMPLE

Popular Questions

Trending Questions