reporting the best alignment with pairwise2

855 views Asked by At

I have a fastq file of reads, say "reads.fastq". I want to align the sequnces to a string saved as a fasta file ref.faa. I am using the following code for this

reads_array = []

for x in Bio.SeqIO.parse("reads.fastq","fastq"):
    reads_array.append(x)

for x in Bio.SeqIO.parse("ref.faa","fasta"):
    refseq = x

result = open("alignments_G10_final","w")

aligned_reads = []

for x in reads_array:
    alignments =pairwise2.align.globalms(str(refseq.seq).upper(),str(x.seq),2,-1,-5,-0.05)
    for a in alignments:
        result.write(format_alignment(*a))
        aligned_reads.append(x)

But I want to report only the best alignment for each read. How can I choose this alignment from the scores in a[2]. I want to choose the alignment(s) with the highest value of a[2]

2

There are 2 answers

0
cnluzon On BEST ANSWER

You could sort the alignments according to a[2]:

for x in reads_array:
  alignments = pairwise2.align.globalms(
      str(refseq.seq).upper(), str(x.seq), 2, -1, -5, -0.05)
  sorted_alignments = sorted(alignments, key=operator.itemgetter(2))
        
  result.write(format_alignment(*sorted_alignments[0]))
  aligned_reads.append(x)
0
SummerEla On

I know this is an old question, but for anyone still looking for the correct answer, add one_alignment_only=True argument to your align method:

alignments =pairwise2.align.globalms(str(refseq.seq).upper(),
    str(x.seq),
    2,-1,-5,-0.05,
    one_alignment_only=True)

I had to do some digging around in the documentation to find it, but this gives back the optimal score.