Why is there a dramatic difference between aov and lmer?

4.2k views Asked by At

I have a mixed model and the data looks like this:

> head(pce.ddply)
  subject Condition errorType     errors
1    j202         G         O 0.00000000
2    j202         G         P 0.00000000
3    j203         G         O 0.08333333
4    j203         G         P 0.00000000
5    j205         G         O 0.16666667
6    j205         G         P 0.00000000

Each subject provides two datapoints for errorType (O or P) and each subject is in either Condition G (N=30) or N (N=33). errorType is a repeated variable and Condition is a between variable. I'm interested in both main effects and the interactions. So, first an anova:

> summary(aov(errors ~ Condition * errorType + Error(subject/(errorType)),
                 data = pce.ddply))

Error: subject
          Df  Sum Sq  Mean Sq F value Pr(>F)
Condition  1 0.00507 0.005065   2.465  0.122
Residuals 61 0.12534 0.002055               

Error: subject:errorType
                    Df  Sum Sq Mean Sq F value   Pr(>F)    
errorType            1 0.03199 0.03199   10.52 0.001919 ** 
Condition:errorType  1 0.04010 0.04010   13.19 0.000579 ***
Residuals           61 0.18552 0.00304                     

Condition is not significant, but errorType is, as well as the interaction.

However, when I use lmer, I get a totally different set of results:

> lmer(errors ~ Condition * errorType + (1 | subject),
                    data = pce.ddply)
Linear mixed model fit by REML 
Formula: errors ~ Condition * errorType + (1 | subject) 
   Data: pce.ddply 
    AIC    BIC logLik deviance REMLdev
 -356.6 -339.6  184.3     -399  -368.6
Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 0.000000 0.000000
 Residual             0.002548 0.050477
Number of obs: 126, groups: subject, 63

Fixed effects:
                       Estimate Std. Error t value
(Intercept)            0.028030   0.009216   3.042
ConditionN             0.048416   0.012734   3.802
errorTypeP             0.005556   0.013033   0.426
ConditionN:errorTypeP -0.071442   0.018008  -3.967

Correlation of Fixed Effects:
            (Intr) CndtnN errrTP
ConditionN  -0.724              
errorTypeP  -0.707  0.512       
CndtnN:rrTP  0.512 -0.707 -0.724

So for lmer, Condition and the interaction are significant, but errorType is not.

Also, the lmer result is exactly the same as a glm result, leading me to believe something is wrong.

Can someone please help me understand why they are so different? I suspect I am using lmer incorrectly (though I've tried many other versions like (errorType | subject) with similar results.

2

There are 2 answers

0
beandip On

Another reason for the difference is that you are treating errorType as a fixed effect when you use lmer (which is consistent with your description of your scenario), but as a random effect that is nested within subject in your aov() code.

I would trust the results of your lmer() call, not those of your aov() call. If you ever check back here, try rerunning your aov as aov(errors ~ Condition * errorType + Error(subject),data = pce.ddply). Since your design is close to balanced, I expect aov() would give you estimates that are similar to those from lmer().

1
Maxim.K On

I believe that the answer is that both methods take different approaches to dealing with variance. ANOVA partitions variance, and the group (repeated measures) variable is simply another consideration in doing that. ANOVA assumes homogeneity of group variance or homoscedasticity, and if this assumption is significantly violated, ANOVA might not be the right method to go with.

Lmer, on the other hand, is essentially a function for multilevel modeling. Within the multilevel framework you get to explicitly model the variance, introducing the distinction between the fixed and random effects (variance, basically). Heteroscedasticity is not a problem here.

Another way to look at it is that ANOVA takes the no-pooling approach (every line is separate), Lmer takes the partial pooling approach (lines share some information).

In addition, ANOVA uses OLS estimation, while lmer uses a version of ML (REML in your case).

This is the best I can explain it, and this should be enough to at least send you in the right direction in your own research of the problem. However, for a more elaborate answer you might indeed want to ask the question on CrossValidated.