I've done differential expression on a huge list of genes and I'm selecting only the significant ones among an external list of genes that I want to study. When I represent my significant ones on the barplot, the p-values I obtained from the moderated t-test (BH correction) with limma, are not the same; for the barplot, I used the ggpubr's function stat_compare_means() and the p-values appearing are totally different and abberrant from the ones obtained with limma.
Does anyone know if this is normal?
The genes I plotted should be correct, I checked multiple times.
Thank you
Limma uses moderated t-test which is specifically designed for data coming from RNA microarrays and has been applied to other genomic and sequencing technologies. It works when mean of features(gene expression /probe signals) shows a binomial/Poisson distribution where the pool of features you are working with has a huge range, where usually highly expressed genes show high standard err / variance.
In the case of highly expressed genes, t-test fails to provide an accurate measure of significance as the p-value depends on the distribution of that gene values despite the difference in means between the two sample groups. (afterall , it is a parametric test)
The moderated t-test in limma takes into consideration mean/variance of different genes coming from the same samples and technology by use of model fitting to successfully select a pool of genes that you can call significant.
In this case, you can use wilcox-test or to add pvalues manually, you can get the limma output results in tibble and try what is shown here: https://www.datanovia.com/en/blog/how-to-add-p-values-onto-basic-ggplots/
I hope this helps!
References:
https://support.bioconductor.org/p/47765/
https://online.stat.psu.edu/stat555/node/46/