I have a dataframe with about 100 columns and over 1000 samples. for statistical analyses, I would like to create a function to:
- Create a summarizing table of multiple combinations of two-way ANOVA tests and its tukey results each time
- with the same columns as Y and Z elements and with a list of 15 X elements (
metals_to_analys
) to run through. The different Xs should be the rows of the dataframe. - Naturally, since each time Im using the same Y and Z, I should get the same combinations of the results of ANOVA and tukey for all of the X elements. I would like to have them all on a table. each result of a combination should be a column.
- I would like to use this function many times, with different Y and Z columns and the same list of X elements to run through.
the simple code
two.way <- aov(df[,c(metal)]~treatment*sex, df)
print(summary(two.way))
print(TukeyHSD(two.way))
}
results are:
[1] "common spiny"
[1] "Y= 51 V ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 20.72 20.717 2.151 0.154
z 1 11.57 11.569 1.201 0.283
x:z 1 1.63 1.626 0.169 0.684
Residuals 27 260.02 9.630
21 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control -1.678333 -4.026221 0.6695539 0.1540083
$z
diff lwr upr p adj
male-female -1.323958 -3.937538 1.289621 0.3078405
$`x:z`
diff lwr upr p adj
pollution:female-control:female -2.528571 -11.607275 6.550133 0.8706417
control:male-control:female -1.723571 -5.762489 2.315347 0.6517362
pollution:male-control:female -2.788571 -6.894570 1.317427 0.2692529
control:male-pollution:female 0.805000 -8.034118 9.644118 0.9944325
pollution:male-pollution:female -0.260000 -9.129970 8.609970 0.9998100
pollution:male-control:male -1.065000 -4.609907 2.479907 0.8434846
[1] "common spiny"
[1] "Y= 52 Cr ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 2.6 2.643 0.162 0.690
z 1 5.1 5.073 0.311 0.581
x:z 1 0.6 0.563 0.035 0.854
Residuals 27 440.0 16.297
21 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control -0.5994737 -3.653753 2.454806 0.6903255
$z
diff lwr upr p adj
male-female -0.8767077 -4.276617 2.523202 0.6010668
$`x:z`
diff lwr upr p adj
pollution:female-control:female -1.06857143 -12.878722 10.741579 0.9945388
control:male-control:female -1.12273810 -6.376817 4.131341 0.9358132
pollution:male-control:female -1.33038961 -6.671731 4.010952 0.9031794
control:male-pollution:female -0.05416667 -11.552649 11.444316 0.9999992
pollution:male-pollution:female -0.26181818 -11.800435 11.276799 0.9999118
pollution:male-control:male -0.20765152 -4.819090 4.403787 0.9993138
[1] "common spiny"
[1] "Y= 75 As ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 0.7 0.669 0.057 0.812
z 1 0.9 0.851 0.073 0.789
x:z 1 4.1 4.119 0.353 0.557
Residuals 34 397.3 11.685
14 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control 0.2750214 -2.06119 2.611233 0.8123573
$z
diff lwr upr p adj
male-female 0.3205038 -2.238687 2.879695 0.800634
$`x:z`
diff lwr upr p adj
pollution:female-control:female -1.84764265 -11.579272 7.883986 0.9554783
control:male-control:female 0.03823081 -3.854421 3.930882 0.9999932
pollution:male-control:female 0.46403555 -3.539330 4.467401 0.9891785
control:male-pollution:female 1.88587347 -7.649137 11.420884 0.9501116
pollution:male-pollution:female 2.31167820 -7.269064 11.892420 0.9142366
pollution:male-control:male 0.42580474 -3.072588 3.924197 0.9875212
[1] "common spiny"
[1] "Y= 27 Al ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 798477 798477 0.810 0.376
z 1 1239534 1239534 1.258 0.272
x:z 1 85426 85426 0.087 0.771
Residuals 27 26606536 985427
21 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control -329.4919 -1080.539 421.5552 0.3759949
$z
diff lwr upr p adj
male-female -433.3621 -1269.399 402.6753 0.296943
$`x:z`
diff lwr upr p adj
pollution:female-control:female -483.16714 -3387.282 2420.9480 0.9680176
control:male-control:female -539.49881 -1831.476 752.4789 0.6669326
pollution:male-control:female -687.23351 -2000.669 626.2020 0.4912043
control:male-pollution:female -56.33167 -2883.808 2771.1442 0.9999403
pollution:male-pollution:female -204.06636 -3041.411 2633.2786 0.9972313
pollution:male-control:male -147.73470 -1281.687 986.2177 0.9841315
[1] "common spiny"
[1] "Y= 55 Mn ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 1131 1131.0 1.117 0.298
z 1 950 949.8 0.938 0.339
x:z 1 298 297.5 0.294 0.591
Residuals 36 36439 1012.2
12 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control -10.98384 -32.05736 10.08969 0.2975203
$z
diff lwr upr p adj
male-female -10.20296 -32.46593 12.06002 0.3588413
$`x:z`
diff lwr upr p adj
pollution:female-control:female -20.0302965 -86.40206 46.34147 0.8480157
control:male-control:female -14.4163543 -49.39734 20.56464 0.6858393
pollution:male-control:female -19.5726288 -55.61389 16.46863 0.4700133
control:male-pollution:female 5.6139421 -58.88782 70.11570 0.9953822
pollution:male-pollution:female 0.4576676 -64.62520 65.54053 0.9999975
pollution:male-control:male -5.1562745 -37.62531 27.31276 0.9733521
[1] "common spiny"
[1] "Y= 60 Ni ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 0.2 0.166 0.012 0.913
z 1 2.2 2.239 0.163 0.689
x:z 1 6.7 6.700 0.489 0.490
Residuals 27 369.8 13.698
21 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control 0.1500877 -2.65007 2.950245 0.9132405
$z
diff lwr upr p adj
male-female -0.5824766 -3.699507 2.534553 0.7044081
$`x:z`
diff lwr upr p adj
pollution:female-control:female -2.2428571 -13.070380 8.584665 0.9410477
control:male-control:female -1.1578571 -5.974787 3.659072 0.9118752
pollution:male-control:female -0.4301299 -5.327061 4.466801 0.9949961
control:male-pollution:female 1.0850000 -9.456786 11.626786 0.9920208
pollution:male-pollution:female 1.8127273 -8.765854 12.391308 0.9652341
pollution:male-control:male 0.7277273 -3.500030 4.955485 0.9647902
[1] "common spiny"
[1] "Y= 63 Cu ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 787 787.4 5.161 0.0292 *
z 1 845 844.9 5.538 0.0242 *
x:z 1 408 407.8 2.673 0.1108
Residuals 36 5493 152.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
12 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control -9.164472 -17.34606 -0.9828865 0.0291785
$z
diff lwr upr p adj
male-female -9.623152 -18.26653 -0.9797746 0.0301051
$`x:z`
diff lwr upr p adj
pollution:female-control:female -20.428749 -46.19692 5.3394268 0.1614714
control:male-control:female -14.355977 -27.93700 -0.7749558 0.0349041
pollution:male-control:female -17.370260 -31.36292 -3.3775999 0.0100236
control:male-pollution:female 6.072772 -18.96939 31.1149371 0.9137667
pollution:male-pollution:female 3.058489 -22.20929 28.3262635 0.9878328
pollution:male-control:male -3.014283 -15.62006 9.5914975 0.9169459
[1] "common spiny"
[1] "Y= 66 Zn ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 6856 6856 1.431 0.240
z 1 1930 1930 0.403 0.530
x:z 1 1675 1675 0.350 0.558
Residuals 34 162920 4792
14 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control -27.84621 -75.1554 19.46299 0.2399019
$z
diff lwr upr p adj
male-female 15.26331 -36.56132 67.08793 0.553451
$`x:z`
diff lwr upr p adj
pollution:female-control:female 7.539945 -189.52938 204.60927 0.9995968
control:male-control:female 23.656420 -55.17131 102.48415 0.8490505
pollution:male-control:female -14.645628 -95.71535 66.42410 0.9612648
control:male-pollution:female 16.116475 -176.97124 209.20419 0.9958814
pollution:male-pollution:female -22.185573 -216.19937 171.82823 0.9895988
pollution:male-control:male -38.302048 -109.14587 32.54178 0.4719761
[1] "common spiny"
[1] "Y= 95 Mo ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 3.99 3.986 0.980 0.331
z 1 3.12 3.119 0.767 0.389
x:z 1 2.17 2.172 0.534 0.471
Residuals 27 109.85 4.068
21 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control -0.7361404 -2.262194 0.789913 0.3310763
$z
diff lwr upr p adj
male-female -0.6874504 -2.386196 1.011295 0.4136302
$`x:z`
diff lwr upr p adj
pollution:female-control:female -1.9857143 -7.886589 3.915160 0.7939575
control:male-control:female -1.0548810 -3.680052 1.570290 0.6927751
pollution:male-control:female -1.3493506 -4.018121 1.319420 0.5200268
control:male-pollution:female 0.9308333 -4.814318 6.675984 0.9703339
pollution:male-pollution:female 0.6363636 -5.128840 6.401568 0.9902062
pollution:male-control:male -0.2944697 -2.598549 2.009609 0.9849908
[1] "common spiny"
[1] "Y= 111 Cd ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 0.624 0.6236 1.122 0.297
z 1 0.002 0.0016 0.003 0.957
x:z 1 0.114 0.1138 0.205 0.654
Residuals 34 18.892 0.5557
14 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control 0.2655667 -0.2438843 0.7750178 0.2969001
$z
diff lwr upr p adj
male-female 0.01396609 -0.5441096 0.5720418 0.9597364
$`x:z`
diff lwr upr p adj
pollution:female-control:female -0.07381750 -2.1959668 2.0483318 0.9996968
control:male-control:female -0.03783619 -0.8866959 0.8110235 0.9993635
pollution:male-control:female 0.26620655 -0.6067962 1.1392093 0.8429160
control:male-pollution:female 0.03598131 -2.0432918 2.1152544 0.9999626
pollution:male-pollution:female 0.34002405 -1.7492217 2.4292698 0.9711681
pollution:male-control:male 0.30404274 -0.4588419 1.0669274 0.7060267
[1] "common spiny"
[1] "Y= 88 Sr ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 333 333.4 1.363 0.254
z 1 647 646.8 2.645 0.117
Residuals 24 5870 244.6
25 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control -7.151225 -19.79305 5.490602 0.254475
$z
diff lwr upr p adj
male-female 10.55384 -4.387205 25.49488 0.1578397
[1] "common spiny"
[1] "Y= 47 Ti ug/g"
[1] "X= treatment"
[1] "Z= sex"
Df Sum Sq Mean Sq F value Pr(>F)
x 1 550 550.3 0.810 0.381
z 1 1 0.8 0.001 0.974
Residuals 17 11549 679.4
32 observations deleted due to missingness
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = y ~ x * z)
$x
diff lwr upr p adj
pollution-control -10.54424 -35.26139 14.17291 0.3806725
$z
diff lwr upr p adj
male-female -0.4332386 -31.17481 30.30833 0.9766259
Show in New Window
Error in `$<-.data.frame`(`*tmp*`, "Name", value = "51 V ug/g") :
The desired ideal result will be something like this (or any other reasonable summary of this long, almost unreadable list):
metal | Y effect pv | Z effect pv | Y:Z effect | tukey combination #1 | tukey combination #2 | tukey combination #3 |
---|---|---|---|---|---|---|
51V | ------ | --------- | ------- | ---------- | ---------- | ------------ |
27Al | ------ | --------- | ------- | ---------- | ---------- | ------------ |
65Cu | ------ | --------- | ------- | ---------- | ---------- | ------------ |
the closest I got was using this answer to a similar question: https://stackoverflow.com/a/75540191/18907906 , which I really liked because of the minimalist writing, but I didn't find the way to fit it to two-way ANOVA, so I couldn't make this last cade row to work properly.
Other limitations I couldn't solve were:
- not all of the columns are supposed to be analyzed so solutions that run through all columns raise an error.
- some of the combinations inside the "loop" (aka the combination of specific X from the list with the specific Y and Z chosen) are not compatible, so I need the function to skip them or write NA.
- I wish for the function to be as short and as general as possible since I would like to use it for many combinations and sometimes inside another function.
a sample of the dataframe:
structure(list(ID = c("o35Fe", "o22Fe", "o21Fe", "o28Fe", "o20Fe",
NA, NA, NA, NA), age = c(4L, 4L, 4L, 4L, 3L, NA, NA, NA, NA),
`individual weight gr` = c(19.2, 20.8, 24.3, 24.4, 19.1,
NA, NA, NA, NA), texa = c("Acrocephalus stentoreus", "Acrocephalus stentoreus",
"Acrocephalus stentoreus", "Acrocephalus stentoreus", "Acrocephalus stentoreus",
NA, NA, NA, NA), treatment = c("pollution", "pollution",
"pollution", "pollution", "pollution", NA, NA, NA, NA), year = c(2018L,
2018L, 2018L, 2018L, 2018L, NA, NA, NA, NA), `9 Be ug/g` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), `27 Al ug/g` = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA), `47 Ti ug/g` = c(0.82, NA, NA,
1.1, NA, NA, NA, NA, NA), `51 V ug/g` = c(0.26, NA, NA, 0.19,
NA, NA, NA, NA, NA), `52 Cr ug/g` = c(1.22, NA, NA, 1.395,
NA, NA, NA, NA, NA), `55 Mn ug/g` = c(8.46, 38.93, 248.44,
13.1, 20.03, NA, NA, NA, NA), `56 Fe ug/g` = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA), `59 Co ug/g` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), `60 Ni ug/g` = c(-1.4, NA, NA, -0.95,
NA, NA, NA, NA, NA), `63 Cu ug/g` = c(0.91, 8.915, 79.715,
4.15, 31.10333333, NA, NA, NA, NA), `66 Zn ug/g` = c(95.64,
NA, NA, 106.03, NA, NA, NA, NA, NA), `75 As ug/g` = c(0.13,
NA, NA, 0.17, NA, NA, NA, NA, NA), `78 Se ug/g` = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA), `88 Sr ug/g` = c(35.87, NA,
NA, 8.205, NA, NA, NA, NA, NA), `95 Mo ug/g` = c(0.28, NA,
NA, 0.05, NA, NA, NA, NA, NA), `107 Ag ug/g` = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA), `111 Cd ug/g` = c(0.07, NA,
NA, 0.11, NA, NA, NA, NA, NA), c(NA, NA, NA, NA, NA, NA,
NA, NA, NA), ` 1` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
` 2` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 3` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), ` 4` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), ` 5` = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA), ` 6` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 7` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), ` 8` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), ` 9` = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA), ` 10` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 11` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), ` 12` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), ` 13` = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA), ` 14` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 15` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), ` 16` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), ` 17` = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA), ` 18` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 19` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), ` 20` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), ` 21` = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA), ` 22` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 23` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), ` 24` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), ` 25` = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA), ` 26` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), `114 Cd ug/g` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), `118 Sn ug/g` = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA), `121 Sb ug/g` = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA), `137 Ba ug/g` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), `208 Pb ug/g` = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA), `trophic level` = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA), `tarsus mm` = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA), Rec = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
Ring = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), Species = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), `Species 1` = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA), Sex = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA), Wing = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA), Tail = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), Fat = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), Time = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), Place = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA), `Coor N` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
`Coor E` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), Ringer = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), Remarks = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA), `Incubation P` = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA), TibuaID = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA), RecID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA
), userID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), speciesID = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA), `Digest L` = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA)), row.names = c(NA, -9L), class = "data.frame")