Multiple t-test comparisons

5k views Asked by At

I would like to know how I can use t.test or pairwise.t.test to make multiple comparisons between gene combinations. First, how can I compare all combinations Gene 1 vs. Gene 3, Gene 3 vs Gene 4, etc.? Second, how would I be able to only compare combinations of Gene 1 with the other genes?

Do I need to make a function for this?

Assuming I have the dataset below, when "arguments are not the same length", what can I do?

Thanks.

Gene   S1      S2      S3      S4      S5      S6     S7
1   20000   12032   23948    2794    5870     782    699
3   15051   17543   18590   21005   22996   26448
4   35023   43092   41858   39637   40933   38865
2

There are 2 answers

2
www On BEST ANSWER

I think that @akrun has a great answer to help on the programming side of this, but since this question is also related to statistics, it seems important to mention that using multiple t-tests may not be considered a statistically sound method of analysis, depending on the number of comparisons in your full dataset. So please keep that in mind. At the very least, applying a Bonferroni correction, or similar, would be recommended here. So I've added that to @akrun's code.

Prior to running the t-tests, it may also be best to run an ANOVA to see if there are any differences overall. Columbia University has a helpful explanation of this approach on their stats page.

That said, I'll show you how to do both for the sake of answering the programming aspect of the question, but for those looking up the same question, please carefully review your methods before using this answer.

I've displayed the following results without scientific notation for the benefit of those less familiar with it, via options(scipen=999) in R.

Pre-t-test ANOVA:

summary(aov(val ~ as.factor(Gene), data=gather(df, key, val, -Gene)))

                Df     Sum Sq    Mean Sq F value     Pr(>F)    
as.factor(Gene)  2 2627772989 1313886494   34.49 0.00000245 ***
Residuals       15  571374752   38091650                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

T-test:

library(broom)
library(dplyr)
library(tidyr)

gather(df, key, val, -Gene) %>% 
  do(data.frame(tidy(pairwise.t.test(.$val, .$Gene, p.adjust="bonferroni"))))

  group1 group2       p.value
1      3      1 0.05691493022
2      4      1 0.00000209244
4      4      3 0.00018020669

EDIT:

For these tests, it doesn't particularly matter if the length of the observations are not exactly the same. The code I've outlined above will still run. However, it's generally good practice in R to make blank or null values equal NA. See this SO answer for a way to change values to NA.

If you'd like to limit your t-tests to only a few gene comparisons, for example, gene 1 vs. gene 3 and gene 1 vs. gene 4, but not gene 3 vs gene 4, the simplest way is to still use the code above. Instead of applying p-value correction inside the pairwise.t.test function, however, just apply it afterword on only the p-values you want to assess. Try this:

res <- gather(df, key, val, -Gene) %>% 
  do(data.frame(tidy(pairwise.t.test(.$val, .$Gene))))

res <- res[res$group1==1 | res$group2 ==1,]

res$p.value <-  p.adjust(res$p.value, method = "bonferroni")

print(res)

  group1 group2        p.value
1      3      1 0.015989134399
2      4      1 0.000001458475

Note that the above is only applying p-value correction on the tests that we've subset and want to asses, which for this example is any combination that involves gene 1, excluding combinations not involving gene 1.

0
Severin Pappadeux On

Ok, here another statistical advice. You might want to take a look at Hotelling T-test, as generalization of t-statistics for multivariate distributions.

Packages: ICSNP with tutorial here, or Hotelling