R: How to create general function to summarize multiple two-way ANOVAs and its Tukey tests into a single dataframe

58 views Asked by At

I have a dataframe with about 100 columns and over 1000 samples. for statistical analyses, I would like to create a function to:

  1. Create a summarizing table of multiple combinations of two-way ANOVA tests and its tukey results each time
  2. with the same columns as Y and Z elements and with a list of 15 X elements (metals_to_analys) to run through. The different Xs should be the rows of the dataframe.
  3. Naturally, since each time Im using the same Y and Z, I should get the same combinations of the results of ANOVA and tukey for all of the X elements. I would like to have them all on a table. each result of a combination should be a column.
  4. I would like to use this function many times, with different Y and Z columns and the same list of X elements to run through.

the simple code

  two.way <- aov(df[,c(metal)]~treatment*sex, df)
  print(summary(two.way))
  print(TukeyHSD(two.way))
}

results are:

[1] "common spiny"
[1] "Y= 51 V ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1  20.72  20.717   2.151  0.154
z            1  11.57  11.569   1.201  0.283
x:z          1   1.63   1.626   0.169  0.684
Residuals   27 260.02   9.630               
21 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff       lwr       upr     p adj
pollution-control -1.678333 -4.026221 0.6695539 0.1540083

$z
                 diff       lwr      upr     p adj
male-female -1.323958 -3.937538 1.289621 0.3078405

$`x:z`
                                     diff        lwr      upr     p adj
pollution:female-control:female -2.528571 -11.607275 6.550133 0.8706417
control:male-control:female     -1.723571  -5.762489 2.315347 0.6517362
pollution:male-control:female   -2.788571  -6.894570 1.317427 0.2692529
control:male-pollution:female    0.805000  -8.034118 9.644118 0.9944325
pollution:male-pollution:female -0.260000  -9.129970 8.609970 0.9998100
pollution:male-control:male     -1.065000  -4.609907 2.479907 0.8434846

[1] "common spiny"
[1] "Y= 52 Cr ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1    2.6   2.643   0.162  0.690
z            1    5.1   5.073   0.311  0.581
x:z          1    0.6   0.563   0.035  0.854
Residuals   27  440.0  16.297               
21 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                        diff       lwr      upr     p adj
pollution-control -0.5994737 -3.653753 2.454806 0.6903255

$z
                  diff       lwr      upr     p adj
male-female -0.8767077 -4.276617 2.523202 0.6010668

$`x:z`
                                       diff        lwr       upr     p adj
pollution:female-control:female -1.06857143 -12.878722 10.741579 0.9945388
control:male-control:female     -1.12273810  -6.376817  4.131341 0.9358132
pollution:male-control:female   -1.33038961  -6.671731  4.010952 0.9031794
control:male-pollution:female   -0.05416667 -11.552649 11.444316 0.9999992
pollution:male-pollution:female -0.26181818 -11.800435 11.276799 0.9999118
pollution:male-control:male     -0.20765152  -4.819090  4.403787 0.9993138

[1] "common spiny"
[1] "Y= 75 As ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1    0.7   0.669   0.057  0.812
z            1    0.9   0.851   0.073  0.789
x:z          1    4.1   4.119   0.353  0.557
Residuals   34  397.3  11.685               
14 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff      lwr      upr     p adj
pollution-control 0.2750214 -2.06119 2.611233 0.8123573

$z
                 diff       lwr      upr    p adj
male-female 0.3205038 -2.238687 2.879695 0.800634

$`x:z`
                                       diff        lwr       upr     p adj
pollution:female-control:female -1.84764265 -11.579272  7.883986 0.9554783
control:male-control:female      0.03823081  -3.854421  3.930882 0.9999932
pollution:male-control:female    0.46403555  -3.539330  4.467401 0.9891785
control:male-pollution:female    1.88587347  -7.649137 11.420884 0.9501116
pollution:male-pollution:female  2.31167820  -7.269064 11.892420 0.9142366
pollution:male-control:male      0.42580474  -3.072588  3.924197 0.9875212

[1] "common spiny"
[1] "Y= 27 Al ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df   Sum Sq Mean Sq F value Pr(>F)
x            1   798477  798477   0.810  0.376
z            1  1239534 1239534   1.258  0.272
x:z          1    85426   85426   0.087  0.771
Residuals   27 26606536  985427               
21 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff       lwr      upr     p adj
pollution-control -329.4919 -1080.539 421.5552 0.3759949

$z
                 diff       lwr      upr    p adj
male-female -433.3621 -1269.399 402.6753 0.296943

$`x:z`
                                      diff       lwr       upr     p adj
pollution:female-control:female -483.16714 -3387.282 2420.9480 0.9680176
control:male-control:female     -539.49881 -1831.476  752.4789 0.6669326
pollution:male-control:female   -687.23351 -2000.669  626.2020 0.4912043
control:male-pollution:female    -56.33167 -2883.808 2771.1442 0.9999403
pollution:male-pollution:female -204.06636 -3041.411 2633.2786 0.9972313
pollution:male-control:male     -147.73470 -1281.687  986.2177 0.9841315

[1] "common spiny"
[1] "Y= 55 Mn ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1   1131  1131.0   1.117  0.298
z            1    950   949.8   0.938  0.339
x:z          1    298   297.5   0.294  0.591
Residuals   36  36439  1012.2               
12 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff       lwr      upr     p adj
pollution-control -10.98384 -32.05736 10.08969 0.2975203

$z
                 diff       lwr      upr     p adj
male-female -10.20296 -32.46593 12.06002 0.3588413

$`x:z`
                                       diff       lwr      upr     p adj
pollution:female-control:female -20.0302965 -86.40206 46.34147 0.8480157
control:male-control:female     -14.4163543 -49.39734 20.56464 0.6858393
pollution:male-control:female   -19.5726288 -55.61389 16.46863 0.4700133
control:male-pollution:female     5.6139421 -58.88782 70.11570 0.9953822
pollution:male-pollution:female   0.4576676 -64.62520 65.54053 0.9999975
pollution:male-control:male      -5.1562745 -37.62531 27.31276 0.9733521

[1] "common spiny"
[1] "Y= 60 Ni ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1    0.2   0.166   0.012  0.913
z            1    2.2   2.239   0.163  0.689
x:z          1    6.7   6.700   0.489  0.490
Residuals   27  369.8  13.698               
21 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff      lwr      upr     p adj
pollution-control 0.1500877 -2.65007 2.950245 0.9132405

$z
                  diff       lwr      upr     p adj
male-female -0.5824766 -3.699507 2.534553 0.7044081

$`x:z`
                                      diff        lwr       upr     p adj
pollution:female-control:female -2.2428571 -13.070380  8.584665 0.9410477
control:male-control:female     -1.1578571  -5.974787  3.659072 0.9118752
pollution:male-control:female   -0.4301299  -5.327061  4.466801 0.9949961
control:male-pollution:female    1.0850000  -9.456786 11.626786 0.9920208
pollution:male-pollution:female  1.8127273  -8.765854 12.391308 0.9652341
pollution:male-control:male      0.7277273  -3.500030  4.955485 0.9647902

[1] "common spiny"
[1] "Y= 63 Cu ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)  
x            1    787   787.4   5.161 0.0292 *
z            1    845   844.9   5.538 0.0242 *
x:z          1    408   407.8   2.673 0.1108  
Residuals   36   5493   152.6                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
12 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff       lwr        upr     p adj
pollution-control -9.164472 -17.34606 -0.9828865 0.0291785

$z
                 diff       lwr        upr     p adj
male-female -9.623152 -18.26653 -0.9797746 0.0301051

$`x:z`
                                      diff       lwr        upr     p adj
pollution:female-control:female -20.428749 -46.19692  5.3394268 0.1614714
control:male-control:female     -14.355977 -27.93700 -0.7749558 0.0349041
pollution:male-control:female   -17.370260 -31.36292 -3.3775999 0.0100236
control:male-pollution:female     6.072772 -18.96939 31.1149371 0.9137667
pollution:male-pollution:female   3.058489 -22.20929 28.3262635 0.9878328
pollution:male-control:male      -3.014283 -15.62006  9.5914975 0.9169459

[1] "common spiny"
[1] "Y= 66 Zn ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1   6856    6856   1.431  0.240
z            1   1930    1930   0.403  0.530
x:z          1   1675    1675   0.350  0.558
Residuals   34 162920    4792               
14 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff      lwr      upr     p adj
pollution-control -27.84621 -75.1554 19.46299 0.2399019

$z
                diff       lwr      upr    p adj
male-female 15.26331 -36.56132 67.08793 0.553451

$`x:z`
                                      diff        lwr       upr     p adj
pollution:female-control:female   7.539945 -189.52938 204.60927 0.9995968
control:male-control:female      23.656420  -55.17131 102.48415 0.8490505
pollution:male-control:female   -14.645628  -95.71535  66.42410 0.9612648
control:male-pollution:female    16.116475 -176.97124 209.20419 0.9958814
pollution:male-pollution:female -22.185573 -216.19937 171.82823 0.9895988
pollution:male-control:male     -38.302048 -109.14587  32.54178 0.4719761

[1] "common spiny"
[1] "Y= 95 Mo ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1   3.99   3.986   0.980  0.331
z            1   3.12   3.119   0.767  0.389
x:z          1   2.17   2.172   0.534  0.471
Residuals   27 109.85   4.068               
21 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                        diff       lwr      upr     p adj
pollution-control -0.7361404 -2.262194 0.789913 0.3310763

$z
                  diff       lwr      upr     p adj
male-female -0.6874504 -2.386196 1.011295 0.4136302

$`x:z`
                                      diff       lwr      upr     p adj
pollution:female-control:female -1.9857143 -7.886589 3.915160 0.7939575
control:male-control:female     -1.0548810 -3.680052 1.570290 0.6927751
pollution:male-control:female   -1.3493506 -4.018121 1.319420 0.5200268
control:male-pollution:female    0.9308333 -4.814318 6.675984 0.9703339
pollution:male-pollution:female  0.6363636 -5.128840 6.401568 0.9902062
pollution:male-control:male     -0.2944697 -2.598549 2.009609 0.9849908

[1] "common spiny"
[1] "Y= 111 Cd ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1  0.624  0.6236   1.122  0.297
z            1  0.002  0.0016   0.003  0.957
x:z          1  0.114  0.1138   0.205  0.654
Residuals   34 18.892  0.5557               
14 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff        lwr       upr     p adj
pollution-control 0.2655667 -0.2438843 0.7750178 0.2969001

$z
                  diff        lwr       upr     p adj
male-female 0.01396609 -0.5441096 0.5720418 0.9597364

$`x:z`
                                       diff        lwr       upr     p adj
pollution:female-control:female -0.07381750 -2.1959668 2.0483318 0.9996968
control:male-control:female     -0.03783619 -0.8866959 0.8110235 0.9993635
pollution:male-control:female    0.26620655 -0.6067962 1.1392093 0.8429160
control:male-pollution:female    0.03598131 -2.0432918 2.1152544 0.9999626
pollution:male-pollution:female  0.34002405 -1.7492217 2.4292698 0.9711681
pollution:male-control:male      0.30404274 -0.4588419 1.0669274 0.7060267

[1] "common spiny"
[1] "Y= 88 Sr ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1    333   333.4   1.363  0.254
z            1    647   646.8   2.645  0.117
Residuals   24   5870   244.6               
25 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff       lwr      upr    p adj
pollution-control -7.151225 -19.79305 5.490602 0.254475

$z
                diff       lwr      upr     p adj
male-female 10.55384 -4.387205 25.49488 0.1578397

[1] "common spiny"
[1] "Y= 47 Ti ug/g"
[1] "X= treatment"
[1] "Z= sex"
            Df Sum Sq Mean Sq F value Pr(>F)
x            1    550   550.3   0.810  0.381
z            1      1     0.8   0.001  0.974
Residuals   17  11549   679.4               
32 observations deleted due to missingness
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = y ~ x * z)

$x
                       diff       lwr      upr     p adj
pollution-control -10.54424 -35.26139 14.17291 0.3806725

$z
                  diff       lwr      upr     p adj
male-female -0.4332386 -31.17481 30.30833 0.9766259

Show in New Window
Error in `$<-.data.frame`(`*tmp*`, "Name", value = "51 V ug/g") : 

The desired ideal result will be something like this (or any other reasonable summary of this long, almost unreadable list):

metal Y effect pv Z effect pv Y:Z effect tukey combination #1 tukey combination #2 tukey combination #3
51V ------ --------- ------- ---------- ---------- ------------
27Al ------ --------- ------- ---------- ---------- ------------
65Cu ------ --------- ------- ---------- ---------- ------------

the closest I got was using this answer to a similar question: https://stackoverflow.com/a/75540191/18907906 , which I really liked because of the minimalist writing, but I didn't find the way to fit it to two-way ANOVA, so I couldn't make this last cade row to work properly.

Other limitations I couldn't solve were:

  • not all of the columns are supposed to be analyzed so solutions that run through all columns raise an error.
  • some of the combinations inside the "loop" (aka the combination of specific X from the list with the specific Y and Z chosen) are not compatible, so I need the function to skip them or write NA.
  • I wish for the function to be as short and as general as possible since I would like to use it for many combinations and sometimes inside another function.

a sample of the dataframe:

structure(list(ID = c("o35Fe", "o22Fe", "o21Fe", "o28Fe", "o20Fe", 
NA, NA, NA, NA), age = c(4L, 4L, 4L, 4L, 3L, NA, NA, NA, NA), 
    `individual weight gr` = c(19.2, 20.8, 24.3, 24.4, 19.1, 
    NA, NA, NA, NA), texa = c("Acrocephalus stentoreus", "Acrocephalus stentoreus", 
    "Acrocephalus stentoreus", "Acrocephalus stentoreus", "Acrocephalus stentoreus", 
    NA, NA, NA, NA), treatment = c("pollution", "pollution", 
    "pollution", "pollution", "pollution", NA, NA, NA, NA), year = c(2018L, 
    2018L, 2018L, 2018L, 2018L, NA, NA, NA, NA), `9 Be ug/g` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), `27 Al ug/g` = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), `47 Ti ug/g` = c(0.82, NA, NA, 
    1.1, NA, NA, NA, NA, NA), `51 V ug/g` = c(0.26, NA, NA, 0.19, 
    NA, NA, NA, NA, NA), `52 Cr ug/g` = c(1.22, NA, NA, 1.395, 
    NA, NA, NA, NA, NA), `55 Mn ug/g` = c(8.46, 38.93, 248.44, 
    13.1, 20.03, NA, NA, NA, NA), `56 Fe ug/g` = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA), `59 Co ug/g` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), `60 Ni ug/g` = c(-1.4, NA, NA, -0.95, 
    NA, NA, NA, NA, NA), `63 Cu ug/g` = c(0.91, 8.915, 79.715, 
    4.15, 31.10333333, NA, NA, NA, NA), `66 Zn ug/g` = c(95.64, 
    NA, NA, 106.03, NA, NA, NA, NA, NA), `75 As ug/g` = c(0.13, 
    NA, NA, 0.17, NA, NA, NA, NA, NA), `78 Se ug/g` = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), `88 Sr ug/g` = c(35.87, NA, 
    NA, 8.205, NA, NA, NA, NA, NA), `95 Mo ug/g` = c(0.28, NA, 
    NA, 0.05, NA, NA, NA, NA, NA), `107 Ag ug/g` = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), `111 Cd ug/g` = c(0.07, NA, 
    NA, 0.11, NA, NA, NA, NA, NA), c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), ` 1` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    ` 2` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 3` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), ` 4` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), ` 5` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ` 6` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 7` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), ` 8` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), ` 9` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ` 10` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 11` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), ` 12` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), ` 13` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ` 14` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 15` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), ` 16` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), ` 17` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ` 18` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 19` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), ` 20` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), ` 21` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ` 22` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), ` 23` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), ` 24` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), ` 25` = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), ` 26` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), `114 Cd ug/g` = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), `118 Sn ug/g` = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), `121 Sb ug/g` = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA), `137 Ba ug/g` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), `208 Pb ug/g` = c(NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `trophic level` = c(NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), `tarsus mm` = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), Rec = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    Ring = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), Species = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), `Species 1` = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), Sex = c(NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), Wing = c(NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), Tail = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), Fat = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), Time = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), Place = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA), `Coor N` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    `Coor E` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), Ringer = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), Remarks = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA), `Incubation P` = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), TibuaID = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), RecID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA
    ), userID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), speciesID = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), `Digest L` = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA)), row.names = c(NA, -9L), class = "data.frame")
0

There are 0 answers