Finding overlapping regions of chromosome with GRanges

1.9k views Asked by At

I have a list of ranges of genomic regions derived from a series of patients.

> head(dotoo)
GRanges with 6 ranges and 3 metadata columns:
    seqnames                 ranges strand |       Id       CN Histology
       <Rle>              <IRanges>  <Rle> | <factor> <factor>  <factor>
[1]        3 [167946693, 168005541]      * |        9        3        MD
[2]        3 [189907623, 189954633]      * |        9        3        MD
[3]        6 [132274121, 132384438]      * |        9        3        MD
[4]       11 [ 67685096,  70138399]      * |        9        4        MD
[5]       12 [ 53859037,  53927595]      * |        9        3        MD
[6]       15 [ 19830049,  20089383]      * |        9        1        MD

When I plot the genomic aberrations using

autoplot(dotoo, aes(fill=as.factor(Id), color=as.factor(Id)))

I see many overlapping regions, see image

enter image description here

How do find out which regions overlaps between at least 3 patients, and have shared CN?

Basically, if you look at the picture, how do I find the regions that "stack" over, and only the part that is shared? Is there a way at all?

1

There are 1 answers

0
Martin Morgan On

Get a list of 'disjoint' regions (maybe this is not what you want? other options are reduce and just the original dotoo object without this step at all)

d = disjoint(dotoo)

find overlaps between the original and each of the disjoint regions

olap = findOverlaps(query=dotoo, subject=d)

split an index into the overlaps based on based on subject and CN

splt = split(seq_along(olap), list(subjectHits(olap), dotoo$CN[queryHits(olap)]))

filter these to the ones satisfying your condition

filt = Filter(function(x) length(x) >= 3, splt)

filt is now a list of indexes into olap. You could create a GRangesList with overlapping elements as

idx = unlist(filt)
grp = rep(seq_along(filt), sapply(filt, length))
splitAsList(dotoo[queryHits(olap)[idx]], grp)

Ask questions about Bioconductor packages on the Bioconductor mailing list (no subscription required).