ANCOVA in R: ANCOVA vs. Error regression = should they not be the same?

723 views Asked by At

I have a question that is really intriguing me. I'm not sure if the problem is with my understanding of an analysis of covariance (ancova) or it is within R. Probably the former is the what is happening here. In summary, my concern is that a regular ancova should output the same results (sums of sq. and F value) as if I would do calculations by hand (doing anova's and them regressing error terms), right? Let's go for an example:

Assume this basic dataset

> datas
   Pen Sex Ration  X     Y
1    1   M      A 38  9.52
2    1   F      A 48  9.94
3    1   M      B 39  8.51
4    1   F      B 48 10.00
5    1   M      C 48  9.11
6    1   F      C 48  9.75
7    2   M      A 35  8.21
8    2   F      A 32  9.48
9    2   M      B 38  9.95
10   2   F      B 32  9.24
11   2   M      C 37  8.50
12   2   F      C 28  8.66
13   3   M      A 41  9.32
14   3   F      A 35  9.32
15   3   M      B 46  8.43
16   3   F      B 41  9.34
17   3   M      C 42  8.90
18   3   F      C 33  7.63
> 

Now, perform an ancova

ancova1 = lm(Y ~ Pen + X + Sex + Ration, data=datas)
> anova(ancova1)
Analysis of Variance Table

Response: Y
          Df Sum Sq Mean Sq F value Pr(>F)
Pen        2 1.3403 0.67017  2.1851 0.1588
X          1 0.7337 0.73372  2.3923 0.1502
Sex        1 0.8773 0.87728  2.8604 0.1189
Ration     2 1.1741 0.58703  1.9140 0.1935
Residuals 11 3.3737 0.30670  

Keep in mind those F values (Fsex and Fration). Now, let's go for a hand computation.

First, an anova for Y

Yreg = lm(Y ~ Pen + Sex + Ration, data=datas)
> anova(Yreg)
Analysis of Variance Table

Response: Y
          Df Sum Sq Mean Sq F value Pr(>F)
Pen        2 1.3403 0.67017  1.7386 0.2172
Sex        1 0.4705 0.47045  1.2204 0.2909
Ration     2 1.0626 0.53129  1.3783 0.2892
Residuals 12 4.6257 0.38548 

Second, an anova for X

Xreg = lm(X ~ Pen + Sex + Ration, data=datas)
> anova(Xreg)
Analysis of Variance Table

Response: X
          Df Sum Sq Mean Sq F value   Pr(>F)   
Pen        2 374.78 187.389  8.4325 0.005162 **
Sex        1  20.06  20.056  0.9025 0.360854   
Ration     2  18.78   9.389  0.4225 0.664787   
Residuals 12 266.67  22.222    

Third, let's regress errorY on errorX, so we get a slope for covariate adjustment.

errorY = resid(Yreg)
errorX = resid(Xreg)
errorreg = lm(errorY ~ errorX)
> coef(errorreg)
 (Intercept)       errorX 
5.465170e-18 6.852083e-02 

You may want to note that the slope for errorX is the same as the one calculated by our first ancova (i.e. 0.06852083).

Now, it is time to go for a change of variable:

Z = datas$Y - 0.06852083*(datas$X)
datas = cbind(datas, Z)

Finally, let's perform an anova in Z

Zreg = lm(Z ~ Pen + Sex + Ration, data=datas)
> anova(Zreg)
Analysis of Variance Table

Response: Z
          Df Sum Sq Mean Sq F value  Pr(>F)  
Pen        2 1.0602 0.53009  1.8855 0.19406  
Sex        1 0.9856 0.98556  3.5056 0.08573 .
Ration     2 1.1821 0.59105  2.1023 0.16491  
Residuals 12 3.3737 0.28114        

Here, it comes my doubt. Should not the F value calculated for Sex and Ration be the same as the one obtained from our first regular ancova? They look different, although residuals are the same. Sum Sq. for Sex and Ration are different in anova over Z from that in ancova. Why is that?

Ok, I understand that Sum Sq for Pen is going to be different, once in the first case I did not adjusted Pen effect by covariate X. But, I would not expect different values for Sex and Ration.

Is this an R issue or am I missing something here?

Many thanks for your attention!

UPDATED

Ok, I got the problem here. Actually, the problem is like Dason said - the different degrees of freedom and something more - a portion of the effect of covariate X on Y not accounted for the change of variable Z. There seem to be a left over effect of X over Y (0.4128). Note that (now Zreg anova look the same as ancova in Y):

Zreg = lm(Z ~ X + Pen + Ration + Sex + interaction(Ration, Sex), data=datas)
> anova(Zreg)
Analysis of Variance Table

Response: Z
          Df Sum Sq Mean Sq F value Pr(>F)
X          1 0.4128 0.41277  1.3459 0.2706
Pen        2 0.7637 0.38187  1.2451 0.3255
Sex        1 0.8773 0.87728  2.8604 0.1189
Ration     2 1.1741 0.58703  1.9140 0.1935
Residuals 11 3.3737 0.30670      

Perhaps, a more thorough comparison would be between anova in Z and anova in Y. But a question remains... is least squares over a general linear model with both categorical and a continuous covariate different from regression of errorY on errorX and then performing a change of variable (finding Z and conducting anova in Z)? In theory, they should look the same, I guess. If not, I would expect that at least they yield the same interpretation of effects.

1

There are 1 answers

0
Emili On

An ANCOVA (or linear model with categorical factor/s and quantitative covariate/s) is similar but different from ANOVA of residuals, the latter being an ad-hoc procedure that has been well criticized (e.g. different d.f., the slope estimate and adjusted values change, not a maximum likelihood procedure, etc.). For further explaination please read:

Darlington, R. B., & Smulders, T. V. (2001). Problems with residual analysis. Animal Behaviour, 62, 599-602.

Freckleton, R. P. (2002). On the misuse of residuals in ecology: regression of residuals vs. multiple regression. Journal of Animal Ecology, 542-545.

GarcĂ­a-Berthou, E. (2001). On the misuse of residuals in ecology: testing regression residuals vs. the analysis of covariance. Journal of Animal Ecology, 708-711.