How to interpret DESeq results with LRT vs. with Wald's test

49 views Asked by At

I am new to the field of RNA-Seq and wanted to ask for advice concerning the proper use of the DESeq() function.

Briefly, my experimental set up is as follows: I have 30 mice in 5 different treatment groups (n=6/group) and a known batch effect (sequencing was performed at two different days, batch effect confirmed visually via PCA & distance matrix after using limma::removeBatchEffect on the vst-transformed matrix, as referenced in the DESeq2 vignette (Why after VST are there still batches in the PCA plot?). According to your vignette (Quick Start), i have defined my design matrix like this:

ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
                             colData  = coldata,
                             design = ~ seq_day + group) #known batch = sequencing days

After reading your vignette (Likelihood ratio test) and several blog entries in this forum, I have run the DESeq() function in two different ways to explore the differences:

dds.LRT <- DESeq(ddsFilteredMat, test="LRT", reduced=~seq_day) #LRT
dds<- DESeq(ddsFilteredMat) #Wald

As I understood it, the LRT version can be comparable to an ANOVA as it tests for differences across all of the groups, after removing the effect exerted by the batch effect whereas Wald's test is used for direct pairwise comparisons between 2 groups. Consequently, I get the same DEGs, LFC and pvals for any contrast extracted from dds.LRT, since it always compares this: LRT p-value: '~ seq_mice_cage + group' vs '~ seq_mice_cage'.

#extract results
res.groupA_LRT <- results(dds.LRT, contrast = c("group", "A", "control"))
res.groupB_LRT <- results(dds.LRT, contrast = c("group", "B", "control"))
res.groupC_LRT <- results(dds.LRT, contrast = c("group", "C", "control"))
res.groupD_LRT <- results(dds.LRT, contrast = c("group", "D", "control"))

#show top 6 rows per result
data.frame(rbind(cbind(head(res.groupA_LRT), group = rep("A", 6), rank = seq(1,6)), 
                            cbind(head(res.groupB_LRT), group = rep("B", 6), rank = seq(1,6)), 
                            cbind(head(res.groupC_LRT), group = rep("C", 6), rank = seq(1,6)), 
                            cbind(head(res.groupD_LRT), group = rep("D", 6), rank = seq(1,6))))

output:

                       baseMean log2FoldChange      lfcSE      stat      pvalue       padj group rank
ENSMUSG00000025902     44.22274    -0.40249365 0.35558183  8.588492 0.072249833 0.22728207     A    1
ENSMUSG00000033845   1310.06464     0.36575419 0.15746631  6.559806 0.161060931 0.36511564     A    2
ENSMUSG00000102275     16.07641     1.30656064 0.79514588  5.777069 0.216426997 0.43092779     A    3
ENSMUSG00000025903   2187.23707     0.14752514 0.14539018  7.719259 0.102420605 0.27909782     A    4
ENSMUSG00000033813    228.33252    -0.24121186 0.19922834 17.043843 0.001895401 0.01960863     A    5
ENSMUSG00000033793   1517.61889    -0.19788049 0.15639653  2.544884 0.636617125 0.77272669     A    6
ENSMUSG00000025902.1   44.22274     0.09838015 0.36613901  8.588492 0.072249833 0.22728207     B    1
ENSMUSG00000033845.1 1310.06464     0.32342520 0.16436483  6.559806 0.161060931 0.36511564     B    2
ENSMUSG00000102275.1   16.07641     2.02343897 0.82375227  5.777069 0.216426997 0.43092779     B    3
ENSMUSG00000025903.1 2187.23707     0.21157088 0.15173852  7.719259 0.102420605 0.27909782     B    4
ENSMUSG00000033813.1  228.33252     0.27254669 0.20653754 17.043843 0.001895401 0.01960863     B    5
ENSMUSG00000033793.1 1517.61889    -0.09809705 0.16316122  2.544884 0.636617125 0.77272669     B    6
ENSMUSG00000025902.2   44.22274     0.23444799 0.36404826  8.588492 0.072249833 0.22728207     C    1
ENSMUSG00000033845.2 1310.06464     0.18044040 0.16541518  6.559806 0.161060931 0.36511564     C    2
ENSMUSG00000102275.2   16.07641     1.42749218 0.81373353  5.777069 0.216426997 0.43092779     C    3
ENSMUSG00000025903.2 2187.23707     0.22914511 0.15270455  7.719259 0.102420605 0.27909782     C    4
ENSMUSG00000033813.2  228.33252    -0.08556367 0.20771886 17.043843 0.001895401 0.01960863     C    5
ENSMUSG00000033793.2 1517.61889    -0.06247866 0.16435798  2.544884 0.636617125 0.77272669     C    6
ENSMUSG00000025902.3   44.22274    -0.16301925 0.23952672  8.588492 0.072249833 0.22728207     D    1
ENSMUSG00000033845.3 1310.06464     0.02632379 0.10753635  6.559806 0.161060931 0.36511564     D    2
ENSMUSG00000102275.3   16.07641     0.62498726 0.49334192  5.777069 0.216426997 0.43092779     D    3
ENSMUSG00000025903.3 2187.23707     0.25842491 0.09947068  7.719259 0.102420605 0.27909782     D    4
ENSMUSG00000033813.3  228.33252    -0.22232786 0.13551622 17.043843 0.001895401 0.01960863     D    5
ENSMUSG00000033793.3 1517.61889    -0.05636333 0.10730159  2.544884 0.636617125 0.77272669     D    6

I have defined my contrasts from the Wald-dds object as follows:

res.groupA_Wald <- results(dds, name = "group_A_vs_control")
res.groupB_Wald <- results(dds, name = "group_B_vs_control")
res.groupC_Wald <- results(dds, name = "group_C_vs_control")
res.groupD_Wald <- results(dds, name = "group_D_vs_control")

(i) How can I put these two dds results to use in my downstream analyses / what are the reasons for me to choose the LRT output for my set up (5 groups; 4 treatments vs. 1 control)? Since the p.values of the LRT outputs are identical regardless of the contrast (since it only represents ANY significant DEGs between the groups), is it valid to use the LRT-generated p.values as cut-off for the Heatmap/Venn diagram?

(ii) Tying in with the "ANOVA-comparison", is it valid to see the Wald-dds object as a post-hoc test? If so, is there a way to approximate a post-hoc correction similar to Tukey's correction (i.e. less conservative as basic differences have already been confirmed via ANOVA)?

Thank you very much in advance. Best, Luise

0

There are 0 answers