I have a list of ranges of genomic regions derived from a series of patients.
> head(dotoo)
GRanges with 6 ranges and 3 metadata columns:
seqnames ranges strand | Id CN Histology
<Rle> <IRanges> <Rle> | <factor> <factor> <factor>
[1] 3 [167946693, 168005541] * | 9 3 MD
[2] 3 [189907623, 189954633] * | 9 3 MD
[3] 6 [132274121, 132384438] * | 9 3 MD
[4] 11 [ 67685096, 70138399] * | 9 4 MD
[5] 12 [ 53859037, 53927595] * | 9 3 MD
[6] 15 [ 19830049, 20089383] * | 9 1 MD
When I plot the genomic aberrations using
autoplot(dotoo, aes(fill=as.factor(Id), color=as.factor(Id)))
I see many overlapping regions, see image
How do find out which regions overlaps between at least 3 patients, and have shared CN
?
Basically, if you look at the picture, how do I find the regions that "stack" over, and only the part that is shared? Is there a way at all?
Get a list of 'disjoint' regions (maybe this is not what you want? other options are
reduce
and just the originaldotoo
object without this step at all)find overlaps between the original and each of the disjoint regions
split an index into the overlaps based on based on subject and CN
filter these to the ones satisfying your condition
filt
is now a list of indexes intoolap
. You could create a GRangesList with overlapping elements asAsk questions about Bioconductor packages on the Bioconductor mailing list (no subscription required).