Random factors in glm not defined because of singularities

657 views Asked by At

I am trying to build a GLMM to fit my data but for some reason all my random effects come back as "not defined because of singularities".

I understand that this would indicate that they are perfectly predicted by another variable, but these variables are time of day, date, and individual ID and are not easily correlated with each other or any other variable. I have been adding them to the model as ...+ (1|randomeffect).

I have tried just including one and not the others, but I get this error regardless. The rest of the model runs fine.

Here is the model and the output:

Call:
glm(formula = df$Sex ~ df$`Low Freq (KHz)` + df$`Full Song Duration` + 
    (1 | df$Individual) + (1 | df$TOD) + (1 | df$DATER), family = binomial(link = "logit"), 
    data = df)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.95539  -0.18003   0.02514   0.10766   2.16469  

Coefficients: (3 not defined because of singularities)
                        Estimate Std. Error z value Pr(>|z|)    
(Intercept)               4.2354     1.0846   3.905 9.42e-05 ***
df$`Low Freq (KHz)`      -0.7999     0.3923  -2.039   0.0414 *  
df$`Full Song Duration`   5.2124     1.2008   4.341 1.42e-05 ***
1 | df$IndividualTRUE         NA         NA      NA       NA    
1 | df$TODTRUE                NA         NA      NA       NA    
1 | df$DATERTRUE              NA         NA      NA       NA  
1

There are 1 answers

0
Ben Bolker On

Your problem is that you're not actually fitting a GLMM: that's not what glm() does. You probably wanted:

library(lme4)
glmer(formula = Sex ~ `Low Freq (KHz)` + `Full Song Duration` + 
               (1 | Individual) + (1 | TOD) + (1 | DATER), 
     family = binomial(link = "logit"), 
     data = df)
  • because it doesn't know about random effects, glm() interprets terms like 1|TOD as a literal "or" statement: in this context, 0 is treated as FALSE and any other number as TRUE, so 1|x is always TRUE — so you ended up with several extra columns of 1s (converted back from TRUE) in your model, which are all collinear with the intercept ...

Some slightly tangential suggestions:

  • it is recommended not to use df$... inside GLM(M) formulae; R knows enough to take these variables from the data frame provided
  • in general I would suggest converting your variable names to something that can be used without backquotes, e.g. low_freq and full_duration (but this is admittedly a matter of taste)