Negative values after Limma's batch effect correction

505 views Asked by At

I have several RNA-seq data sets. Using a deconvolution approach, I conducted a cell type enrichment analysis for each one of them and then combined the results into one dataframe, yielding 1000+ samples in columns and 38 cell types in rows. The datasets are from different cancer articles. So, naturaly before visualizing the data using volcano plot or t-SNE I need to correct batch effects for cancer type AND dataset, so that the source won't effect the results. I did that using the followng code :

scores.batch = limma::removeBatchEffect(scores ,metadata$Cancer_Type, metadata$dataset)

However, for some cell types in some samples I got a negative score, which makes no sense of course. Something is wrong.

dput of the scores matrix:

structure(c(0, 0.0853672252935787, 0.0472255148477786, 0.0505467828272972, 
0, 0.0308325695761715, 0, 0.157955518619051, 0.00989687292281167, 
0.03263377453636, 0.174135667551838, 0.0287360256296349, 0.0647519562755579, 
0, 0, 0, 0.0131324709303641, 0, 0, 0.131356081785285, 0, 0.0389487771231123, 
0.143102950679691, 0, 0, 0, 0, 0, 0.0120903254374909, 0, 0, 0.0146819273419876, 
0.00547214738400891, 0.0128837171879466, 0, 0, 0.0458955588957287, 
0, 0, 0.0132395370608289, 0, 0.188105588373935, 0.458389955317805, 
0, 0, 0, 0, 0.202322209601319, 0.0070140951370079, 0.0674561160550705, 
0.257105522741856, 0.0187125792218268, 0.132650077873857, 0.0464882832616245, 
0, 0.267398408455589, 0.257988913719892, 0, 0.0327859369672344, 
0.190621289930972, 0.00595393276866058, 0.0257929623669804, 0.0286417150045293, 
0.00692628582485207, 0, 0, 0, 0.103067773960521, 0.0254486580186314, 
0.0280937010981759, 0.0571003379986667, 0, 0.0129979208251825, 
0.0665627159432736, 0, 0, 0.224047712805128, 0, 0.0136182944729644, 
0.0432680414524333, 0.0399461338850251, 0.0292693281178669, 0.366507229736257, 
0, 0, 0, 0, 0, 0.00669552195301393, 0.0185218739472336, 0.0255519328964942, 
0, 0.0733344287554076, 0.0255903177243924, 0, 0.39146057499213, 
0.0111881442508292, 0, 0, 0.0511976959994528, 0, 0.00579928556115081, 
0.0732688902065305, 0, 0, 0, 0, 0, 0.0110070088566591, 0, 0, 
0.0106345827415758, 0, 0.0161657089454384, 0, 0, 0.181452946136114, 
0, 0, 0.0142759665417351, 0, 0.0462555010600369, 0.0827203733228943, 
0.0145026248884816, 0, 0.013260218865482, 0, 0.0933479614509043, 
0.00774602641682057, 0.0119668338387216, 0.129131677414995, 0.0239962329230613, 
0.0204322461340539, 0.0493841568846939, 0, 0, 0.121244199785082, 
0, 0.0121972859914857, 0.140024857933727, 0.174619321250637, 
0.220714394591806, 0.0357916262655448, 0.0545063692410225, 0, 
0, 0, 0.137161377602957, 0.00590382216872294, 0.0201750503599633, 
0.142034903521219, 0.0985879590151414, 0.0335131516620065, 0.090677099547935, 
0.00507741177479126, 0.037356635316015, 0.201168399838889, 0, 
0.0314008923657083, 0.365359437170722, 0, 0.0335843135244289, 
0.0715582133522154, 0, 0, 0, 0, 0.0474378875432263, 0.0209691496515952, 
0.0172794413473455, 0.0135720611847538, 0.033428514409707, 0.0105720693466205, 
0, 0, 0, 0.06773450392978, 0, 0, 0.00586743854873324, 0.0168258454272402, 
0.0210951521159853, 0.243788369183411, 0.0220752365135898, 0, 
0, 0, 0, 0.0306987963385464, 0, 0, 0.0214207906806456, 0, 0.00976336517826329, 
0.0271958080474159, 0.00901990828923357, 0.0107550873887369, 
0, 0, 0.0447193474549512, 0, 0.0893397248630643, 0.0628755200633895, 
0.00689545950391347, 0, 0, 0, 0, 0.0317975383226698, 0.0326506928899438, 
0.0585870944941104, 0.0325612902279177, 0.015309108818366, 0.00884806480492375, 
0, 0, 0.090018886142378, 0, 0, 0.0252109197429766, 0, 0.0575320783388593, 
0.0360786525651136, 0, 0, 0, 0, 0.033985164346289, 0.0224789756565266, 
0, 0.0110759152952279, 0.0117488957883667, 0.0308459132319819, 
0.0280619366351415, 0, 0.118913468206155, 0.104597268143716, 
0, 0, 0.0725014944946794, 0.0178909285824974, 0.107561160668656, 
0.103657490882649, 0.00912992981258696, 0, 0, 0, 0.0705798568396863, 
0.0358671574380446, 0.0436978038106949, 0.0947966583779633, 0.00754414348305365, 
0.0427209871099505, 0.0293558198269896, 0, 0, 0.186905499424745, 
0, 0.00921431451770207, 0.0392728923365106, 0.457917600754677, 
0.0346030375240686, 0, 0.00559973035365259, 0, 0, 0, 0.0752873255178603, 
0, 0, 0.0146401887588438, 0.0149177458753822, 0.0746262560762416, 
0.263898927149848, 0.00724132393694323, 0.0356656672388469, 0.408802700748097, 
0, 0.105516874044416, 0.0759265575312881, 0, 0, 0, 0, 0, 0, 0, 
0.112847654739959, 0.0142421329460783, 0.0261230668401576, 0, 
0.00638014939397572, 0.0315337646404048, 0.0165989987896856, 
0, 0.359720477023771, 0.119577703639366, 0, 0, 0.00856089558535689, 
0, 0.00683428737177429, 0.0668329268575581, 0, 0, 0, 0, 0.047734960135924, 
0, 0, 0, 0, 0, 0, 0, 0, 0.107598342312041, 0, 0, 0.0121935326110086, 
0, 0, 0.135919574921458, 0, 0, 0, 0, 0, 0.0128474874078213, 0, 
0, 0, 0, 0.0153109234303942, 0, 0, 0.158969240948426, 0, 0, 0, 
0, 0.0232230943661847, 0.140426779187137, 0, 0, 0, 0, 0.0336919132251274, 
0.0340940005017551, 0.00546712364093891, 0.013544098663573, 0.00839243775091744, 
0, 0.00548575092813788, 0, 0, 0.0553411208343392, 0, 0, 0.0125206755368664, 
0.0182216981318242, 0.121162914437175, 0.114036773041914, 0.0357279266394587, 
0, 0, 0, 0.211648365695196, 0, 0.0354172784066678, 0.169262066444321, 
0.0630110794062426, 0.0606400985951092, 0.101323746391281, 0, 
0.021421960793034, 0.288751473459872, 0, 0.024183015113392, 0.352175638353027, 
0, 0.0258095846797072, 0.0228475888849942, 0, 0, 0, 0, 0.0802318746420269, 
0, 0, 0.0209026176594105, 0.0167803844651298, 0.0668381647275034, 
0.0264858410661633, 0, 0.00902616117758849, 0.10613905468228, 
0, 0, 0.0868373941926339), .Dim = c(20L, 20L), .Dimnames = list(
    c("Adipocytes", "B-cells", "Basophils", "CD4+ memory T-cells", 
    "CD4+ naive T-cells", "CD4+ T-cells", "CD4+ Tcm", "CD4+ Tem", 
    "CD8+ naive T-cells", "CD8+ T-cells", "CD8+ Tcm", "Class-switched memory B-cells", 
    "DC", "Endothelial cells", "Eosinophils", "Epithelial cells", 
    "Fibroblasts", "Hepatocytes", "ly Endothelial cells", "Macrophages"
    ), c("Pt1", "Pt10", "Pt103", "Pt106", "Pt11", "Pt17", "Pt2", 
    "Pt24", "Pt26", "Pt27", "Pt28", "Pt29", "Pt31", "Pt36", "Pt37", 
    "Pt38", "Pt39", "Pt4", "Pt46", "Pt47")))

I tired again by correcting only for Cancer_Type but the results are not good (t-SNE isn't clustering the data as I need).

What might be the problem here ?

1

There are 1 answers

1
danlooo On

Limma assumes that both types of batch effects are additive and therefore independent. Writing limma::removeBatchEffect(x = scores, batch = cancer_type, batch2 = study implies that there is no relation between the cancer type and the study dataset. However, it is very likely that one study is about one cancer type and another study is about another cancer type. Therefore, the assumptions of the limma model broken are probably broken.

However, you can create just one batch and use this as a single batch argument:

metadata$merged_batch <- paste0(metadata$Cancer_Type, metadata$dataset)

scores.batch = limma::removeBatchEffect(scores, batch = metadata$merged_batch)