How to use pysam.view to emulate all functions of samtools view

1.1k views Asked by At

I am trying to use pysam.view() to filter out certain alignments from a BAM file. The problem I am facing is how to include several regions in the filter.

pysam.view() emulates the samtools view command which allows one to enter several regions separated by the space character, eg:

samtools view opts bamfile chr1:2010000-20200000 chr2:2010000-20200000 

But the corresponding pysam.view call:

pysam.view(ops, bamfile, '1:2010000-20200000 2:2010000-20200000')

does not work. It does not return any alignments. I'm quite sure the problem lies in how to specify the list of regions, since the following command works fine:

pysam.view(ops, bamfile, '1:2010000-20200000')

and returns alignments.

My question is: does pysam.view support multiple regions and how does one specify this list? I have searched for documentation regarding this but not found anything.

1

There are 1 answers

0
Brunox13 On

The short answer to your question is that the format you'd use is

pysam.view(ops, bamfile, '1:2010000-20200000','2:2010000-20200000')

(Also note that the number indicating the end of each of your regions is ~10x larger than the beginning - it seems you might have intended 2010000-2020000 instead.)

I have tested it using the following code:

import pysam

my_bam_file = '/path/to/my/bam_file.bam'
alignments1 = pysam.view(my_bam_file, '1:2010000-4000000')
alignments2 = pysam.view(my_bam_file, '1:5000000-6000000')
alignments3 = pysam.view(my_bam_file, '1:2010000-4000000', '1:5000000-6000000')

print(len(alignments1) + len(alignments2) == len(alignments3))

[Output:] True

However, this way of extracting alignments is not very efficient, as the output you get is one large str, instead of individual alignments. To get a list of separate alignments instead, use the following code:

import pysam

my_bam_file = '/path/to/my/bam_file.bam'
imported = pysam.AlignmentFile(my_bam_file, mode = 'rb')
regions = ('1:2010000-20200000','2:2010000-20200000')
alignments = []
for region in regions:
    bam = imported.fetch(region = region, until_eof = True)
    alignments.extend([alignment for alignment in bam])

Each element of alignment then ends up being a pysam.AlignedSegment object, with which you can work further using the functions in pysam API.