How do I interpret the TukeyHSD output in R? (in relation to the underlying regression model)

11.6k views Asked by At

I built a simple linear regression model with 'Score' as the dependent variable, and 'Activity' as the independent one. 'Activity' has 5 levels: 'listen' (reference level), 'read1', 'read2', 'watch1', 'watch2'.

Call:
lm(formula = Score ~ Activity)

Residuals:
     Min       1Q   Median       3Q      Max 
-22.6154  -8.6154  -0.6154   7.1346  31.3846 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      41.615      2.553  16.302   <2e-16 ***
Activityread1     6.385      7.937   0.804   0.4254    
Activityread2    20.885      9.552   2.186   0.0340 *  
Activitywatch1    3.885      4.315   0.900   0.3728    
Activitywatch2  -11.415      6.357  -1.796   0.0792 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.02 on 45 degrees of freedom
Multiple R-squared:  0.1901,    Adjusted R-squared:  0.1181 
F-statistic:  2.64 on 4 and 45 DF,  p-value: 0.04594

In order to obtain all pairwise comparisons, I performed a TukeyHSD test, whose output I'm having difficulty interpreting. While the output of the model shows that the only significant effect we have is due to the contrast between 'listen' and 'read2', the TukeyHSD results yield that the only significant contrast exists between 'watch2' and 'read2'. What does this mean?

> TukeyHSD(aov(mod4), "Activity")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = mod4)

$Activity
                    diff        lwr       upr     p adj
read1-listen    6.384615 -16.168371 28.937602 0.9279144
read2-listen   20.884615  -6.256626 48.025857 0.2034549
watch1-listen   3.884615  -8.376548 16.145779 0.8952957
watch2-listen -11.415385 -29.477206  6.646437 0.3885969
read2-read1    14.500000 -19.264610 48.264610 0.7397464
watch1-read1   -2.500000 -26.031639 21.031639 0.9981234
watch2-read1  -17.800000 -44.811688  9.211688 0.3466391
watch1-read2  -17.000000 -44.959754 10.959754 0.4278714
watch2-read2  -32.300000 -63.245777 -1.354223 0.0368820
watch2-watch1 -15.300000 -34.569930  3.969930 0.1783961

1

There are 1 answers

2
Nate On BEST ANSWER

In your initial model summary, Estimate is showing the estimated difference in mean for each group relative to the mean of the "listen" group (40.615). The "read2" group, has the largest shift (+20.885) away from the "listen" group is called significant with p = .0340 when only these 4 comparisons are calculated.

Since TUKEYHSD is performing all pairwise comparisons for the group means (not just to reference level "listen" anymore), it is also performing p-value adjustments to account for all of these extra tests. Reason being, if you performed 20 comparisons on random data you'd expect one (1/20 or .05) to be called significant with p < .05 simply because of doing that many tests. With the p-value adjustment factored in, your originally significant comparison between "listen - read2" no longer qualifies as significant.

But the larger difference between "watch2 - read2" (-32.3), which wasn't tested in the original model summary, is large enough to be considered significant with p = .03688 even after doing all of the extra comparison adjusting.

Hope that helps, you can read more about the multiple comparrison problem here . And see ?p.adjust for R's implementation of the most popular methods.