Post-hoc test for glmer

17.8k views Asked by At

I'm analysing my binomial dataset with R using a generalized linear mixed model (glmer, lme4-package). I wanted to make the pairwise comparisons of a certain fixed effect ("Sound") using a Tukey's post-hoc test (glht, multcomp-package).

Most of it is working fine, but one of my fixed effect variables ("SoundC") has no variance at all (96 times a "1" and zero times a "0") and it seems that the Tukey's test cannot handle that. All pairwise comparisons with this "SoundC" give a p-value of 1.000 whereas some are clearly significant.

As a validation I changed one of the 96 "1"'s to a "0" and after that I got normal p-values again and significant differences where I expected them, whereas the difference had actually become smaller after my manual change.

Does anybody have a solution? If not, is it fine to use the results of my modified dataset and report my manual change?

Reproducible example:

Response <- c(1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,1,0,
              0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,
              1,1,0,1,1,0,1,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,0,1)    
Data <- data.frame(Sound=rep(paste0('Sound',c('A','B','C')),22),
                   Response,
                   Individual=rep(rep(c('A','B'),2),rep(c(18,15),2)))


# Visual
boxplot(Response ~ Sound,Data)

# Mixed model
library (lme4)
model10 <- glmer(Response~Sound + (1|Individual), Data, family=binomial)

# Post-hoc analysis
library (multcomp)
summary(glht(model10, mcp(Sound="Tukey")))
1

There are 1 answers

2
Ben Bolker On BEST ANSWER

This is verging on a CrossValidated question; you are definitely seeing complete separation, where there is a perfect division of your response into 0 vs 1 results. This leads to (1) infinite values of the parameters (they're only listed as non-infinite due to computational imperfections) and (2) crazy/useless values of the Wald standard errors and corresponding $p$ values (which is what you're seeing here). Discussion and solutions are given here, here, and here, but I'll illustrate a little more below.

To be a statistical grouch for a moment: you really shouldn't be trying to fit a random effect with only 3 levels anyway (see e.g. http://glmm.wikidot.com/faq) ...

Firth-corrected logistic regression:

library(logistf)
L1 <- logistf(Response~Sound*Individual,data=Data,
        contrasts.arg=list(Sound="contr.treatment",
         Individual="contr.sum"))

                                 coef se(coef)            p
(Intercept)              3.218876e+00 1.501111 2.051613e-04 
SoundSoundB             -4.653960e+00 1.670282 1.736123e-05 
SoundSoundC             -1.753527e-15 2.122891 1.000000e+00 
IndividualB             -1.995100e+00 1.680103 1.516838e-01 
SoundSoundB:IndividualB  3.856625e-01 2.379919 8.657348e-01 
SoundSoundC:IndividualB  1.820747e+00 2.716770 4.824847e-01

Standard errors and p-values are now reasonable (p-value for the A vs C comparison is 1 because there is literally no difference ...)

Mixed Bayesian model with weak priors:

library(blme)
model20 <- bglmer(Response~Sound + (1|Individual), Data, family=binomial,
                  fixef.prior = normal(cov = diag(9,3)))

##              Estimate Std. Error    z value     Pr(>|z|)
## (Intercept)  1.711485   2.233667  0.7662221 4.435441e-01
## SoundSoundB -5.088002   1.248969 -4.0737620 4.625976e-05
## SoundSoundC  2.453988   1.701674  1.4421024 1.492735e-01

The specification diag(9,3) of the fixed-effect variance-covariance matrix produces

$$ \left( \begin{array}{ccc} 9 & 0 & 0 \ 0 & 9 & 0 \ 0 & 0 & 9 \end{array} \right) $$

In other words, the 3 specifies the dimension of the matrix (equal to the number of fixed-effect parameters), and the 9 specifies the variance -- this corresponds to a standard devation of 3 or a 95% range of about $\pm 6$, which is quite large/weak/uninformative for logit-scaled responses.

These are roughly consistent (the model is very different)

library(multcomp)
summary(glht(model20, mcp(Sound="Tukey")))

##                     Estimate Std. Error z value Pr(>|z|)    
## SoundB - SoundA == 0   -5.088      1.249  -4.074 0.000124 ***
## SoundC - SoundA == 0    2.454      1.702   1.442 0.309216    
## SoundC - SoundB == 0    7.542      1.997   3.776 0.000397 ***

As I said above, I would not recommend a mixed model in this case anyway ...