How to report ANOVA results when # of means tested is large

301 views Asked by At

I have 5 means from 5 distributions: Mean Group A 33 Group B 5500 Group C 33 Group D 32223 Group E 80

I want to determine if the difference in means is significant so I run an anova and the p-value < .05 so there are at least differences in 2 means.

n=500
value<- stack(data.frame(x= rnorm(n,33,7),y=rnorm(n,5500,5), z=rnorm(n,33,7) , a=rnorm(n,32223,7) , b=rnorm(n,80,4)    )   )
ex =  rep(LETTERS[1:5],each=n)
dat = data.frame(value= value$values,ex)
results = aov(value ~ ex, data=dat) #NULL is EQUAL MEANS FOR ALL GROUPS, alternative is at least 2 means different. p-value < .05 means reject null and have difference in means
summary(results)

Then I want to determine which differences are significant so I run the TukeyHSD test and it report these results

t=TukeyHSD(results, conf.level = 0.95) #p-value<.05 means difference are significant
t

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = value ~ ex, data = dat)

$ex
             diff           lwr           upr     p adj
B-A   5467.316931  5.466280e+03   5468.353394 0.0000000
C-A      0.591297 -4.451667e-01      1.627761 0.5251299
D-A  32190.195837  3.218916e+04  32191.232301 0.0000000
E-A     47.576884  4.654042e+01     48.613347 0.0000000
C-B  -5466.725634 -5.467762e+03  -5465.689170 0.0000000
D-B  26722.878907  2.672184e+04  26723.915370 0.0000000
E-B  -5419.740047 -5.420777e+03  -5418.703583 0.0000000
D-C  32189.604540  3.218857e+04  32190.641004 0.0000000
E-C     46.985587  4.594912e+01     48.022050 0.0000000
E-D -32142.618953 -3.214366e+04 -32141.582490 0.0000000

My qurestion is how do you report the results of the TUKEYHSD to an audience. There are 10 differences and only C-A is not significant but what I report to my audience is just the means

          Mean
Group A   33
Group B   5500
Group C   33
Group D   32223
Group E   80

In reality I have 50 means not 10 so the Tukey HSD test would return (50^2-50)/2 = 1225 differences!!! How do I report on those 1225 differences?

I know this question is more on reporting but it seems like a real problem. How should one communicate that some of the differences are significant while others are not when the # of means tested is large?

Thank you.

1

There are 1 answers

2
Jthorpe On

Consider using a heat map:

# the unique values of `ex`
uex = unique(ex) 

# create a matrix to told the comparisons
mat  <- matrix(NA,length(uex),length(uex))
dimnames(mat)  <-  list(uex,uex)

# fill it with the differences (or p-values)
mat[lower.tri(mat)]  <-  t$ex[,'diff']

# plot a heat map using image()
image(t(mat),
        breaks= c(-1,.01,.5,2),# break points for significance
        col = c('red','green','white'),# colors to indicate significance
        axes=FALSE)

# make nice labels
box()
axis(1,at=(seq(length(uex))-1)/(length(uex)-1),labels=uex)
axis(2,at=(seq(length(uex))-1)/(length(uex)-1),labels=uex)

you could also use heatmap(mat), but assigning color according to significance level becomes dificult.