Predicting from glmmTMB with truncated counts using link predictions

145 views Asked by At

This question is a direction follow-up to the question asked here, using the code provided in the answer. I'm running a glmmTMB with a truncated nbinom2 family. I would like to make predictions on the link scale, to be able to calculate the 95% CIs, prior to back-transforming to the response scale. I can't figure out how to do that with the truncated_nbinom2 family.

Example (as shown in the linked question/answer):

library(dplyr)
library(glmmTMB)

set.seed(1)
df <- data.frame(Group = rep(c("a", "b"), each = 20))

## clunky trunc nbinom generator
tnb <- rep(0, 40)
z <- (tnb==0)
while(any(z)) {
    tnb[z] <- rnbinom(sum(z), mu = 1, size = 1)
    z <- (tnb==0)
}
df$Nnb <- tnb
## summarize
df %>% group_by(Group) %>% summarize(across(starts_with("N"), mean))

m <- glmmTMB(Nnb ~ Group, data = df, family = "truncated_nbinom2")
df %>% group_by(Group) %>% summarize(mean(Nnb))
predict(m, newdata = data.frame(Group = c("a", "b")), type = "response")

If this wasn't a truncated model, I would be able to get the response by calculating the conditional values on the link scale and multiplying by the probability of 1s (as shown below). But of course the resulting values don't match the response-scale predictions from the models, because of the zero truncation. How do I do what is shown below, but accounting for the truncated_nbinom2 family?

mu <- predict(m, newdata = data.frame(Group = c("a", "b")), type = "link", se.fit = TRUE)
zi <- predict(m, newdata = data.frame(Group = c("a", "b")), type = "zlink", se.fit = TRUE)

Pred.response <- exp(mu$fit)*(1 - plogis(zi$fit))
Lower.response <- exp(mu$fit - 1.96*mu$se.fit) * (1 - plogis(zi$fit - 1.96*zi$se.fit))
Upper.response <- exp(mu$fit + 1.96*mu$se.fit) * (1 -plogis(zi$fit + 1.96*zi$se.fit))

Pred.response
1

There are 1 answers

0
Daniel On BEST ANSWER

I think your issue here is similar to what is described here: https://strengejacke.github.io/ggeffects/articles/introduction_randomeffects.html#marginal-effects-for-zero-inflated-mixed-models

Let me cite the relevant paragraph:

For type = "zero_inflated", the predicted response value is the expected value mu*(1-p) without conditioning on random effects. Since the zero inflation and the conditional model are working in “opposite directions”, a higher expected value for the zero inflation means a lower response, but a higher value for the conditional model means a higher response. While it is possible to calculate predicted values with predict(..., type = "response"), standard errors and confidence intervals can not be derived directly from the predict()-function. Thus, confidence intervals for type = "zero_inflated" are based on quantiles of simulated draws from a multivariate normal distribution (see also Brooks et al. 2017, pp.391-392 for details).

The referenced paper (which is from the glmmTMB authors) can be downloaded here: https://journal.r-project.org/archive/2017/RJ-2017-066/RJ-2017-066.pdf

This reflects the authors' view on this issue:

Getting accurate CI on the response will involve simulation, such as parametric bootstrapping (see appendix A, section "Alternative prediction method") or using the simulate() function. It's possible that @bbolker or other people have other ideas.

(the longer discussion is here: https://github.com/glmmTMB/glmmTMB/issues/378 - read to the end of that issue!)

This is what is implemented in the ggeffects package:

library(ggeffects)
library(extraDistr)
library(glmmTMB)

set.seed(1)
df <- data.frame(Group = rep(c("a", "b"), each = 20), Np = rtpois(40, 1, a = 0))

## clunky trunc nbinom generator
tnb <- rep(0, 40)
z <- (tnb==0)
while (any(z)) {
  tnb[z] <- rnbinom(sum(z), mu = 1, size = 1)
  z <- (tnb == 0)
}
df$Nnb <- tnb

m <- glmmTMB(Np ~ Group, ziformula = ~1, data = df, family = truncated_nbinom2)
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; false convergence (8). See vignette('troubleshooting'),
#> help('diagnose')
ggpredict(m, "Group", type = "zero_inflated")
#> # Predicted counts of Np
#> 
#> Group | Predicted |       95% CI
#> --------------------------------
#> a     |      1.75 | [0.91, 2.59]
#> b     |      1.45 | [0.74, 2.16]

Created on 2023-11-03 with reprex v2.0.2

Note that, since simulations are involved, the CIs slightly change each time you re-run ggpredict(), so you need set.seed() to replicate identical results.