I built a simple linear regression model with 'Score' as the dependent variable, and 'Activity' as the independent one. 'Activity' has 5 levels: 'listen' (reference level), 'read1', 'read2', 'watch1', 'watch2'.
Call:
lm(formula = Score ~ Activity)
Residuals:
Min 1Q Median 3Q Max
-22.6154 -8.6154 -0.6154 7.1346 31.3846
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.615 2.553 16.302 <2e-16 ***
Activityread1 6.385 7.937 0.804 0.4254
Activityread2 20.885 9.552 2.186 0.0340 *
Activitywatch1 3.885 4.315 0.900 0.3728
Activitywatch2 -11.415 6.357 -1.796 0.0792 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 13.02 on 45 degrees of freedom
Multiple R-squared: 0.1901, Adjusted R-squared: 0.1181
F-statistic: 2.64 on 4 and 45 DF, p-value: 0.04594
In order to obtain all pairwise comparisons, I performed a TukeyHSD test, whose output I'm having difficulty interpreting. While the output of the model shows that the only significant effect we have is due to the contrast between 'listen' and 'read2', the TukeyHSD results yield that the only significant contrast exists between 'watch2' and 'read2'. What does this mean?
> TukeyHSD(aov(mod4), "Activity")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = mod4)
$Activity
diff lwr upr p adj
read1-listen 6.384615 -16.168371 28.937602 0.9279144
read2-listen 20.884615 -6.256626 48.025857 0.2034549
watch1-listen 3.884615 -8.376548 16.145779 0.8952957
watch2-listen -11.415385 -29.477206 6.646437 0.3885969
read2-read1 14.500000 -19.264610 48.264610 0.7397464
watch1-read1 -2.500000 -26.031639 21.031639 0.9981234
watch2-read1 -17.800000 -44.811688 9.211688 0.3466391
watch1-read2 -17.000000 -44.959754 10.959754 0.4278714
watch2-read2 -32.300000 -63.245777 -1.354223 0.0368820
watch2-watch1 -15.300000 -34.569930 3.969930 0.1783961
In your initial model summary,
Estimate
is showing the estimated difference in mean for each group relative to the mean of the "listen" group (40.615). The "read2" group, has the largest shift (+20.885) away from the "listen" group is called significant withp = .0340
when only these 4 comparisons are calculated.Since
TUKEYHSD
is performing all pairwise comparisons for the group means (not just to reference level "listen" anymore), it is also performing p-value adjustments to account for all of these extra tests. Reason being, if you performed 20 comparisons on random data you'd expect one (1/20 or .05) to be called significant withp < .05
simply because of doing that many tests. With the p-value adjustment factored in, your originally significant comparison between "listen - read2" no longer qualifies as significant.But the larger difference between "watch2 - read2" (-32.3), which wasn't tested in the original model summary, is large enough to be considered significant with
p = .03688
even after doing all of the extra comparison adjusting.Hope that helps, you can read more about the multiple comparrison problem here . And see
?p.adjust
for R's implementation of the most popular methods.