Codon alignment via Python?

2k views Asked by At

I have pairs of coding DNA sequences which I wish to perform pairwise codon alignments via Python, I have "half completed" the process.

So far..

  • I retrive pairs of orthologous DNA sequences from genbank using Biopython package.
  • I translate the orthologous pairs into peptide sequences and then align them using EMBOSS Needle program.

I wish to..

  • Transfer the gaps from the peptide sequences into the original DNA sequences.

Question

I would appreciate suggestions for programs/code (called from Python) that can transfer gaps from aligned peptide sequence pairs onto codons of the corresponding nucleotide sequence pairs. Or programs/code that can carry out the pairwise codon alignment from scratch.

enter image description here

4

There are 4 answers

3
Stylize On BEST ANSWER

All you need to do is split the nucleotide sequence into triplets. Each amino-acid is a triplet, each gap is three gaps. so in pseudo code:

for x in range(0, len(aminoacid)):
    if x != "-":
       print nucleotide[3x:3x+3]
    else:
       print "---"
1
apai On

You can make a mapping of peptides to nucleotides with the addition of your missing character:

codons = str.maketrans({'M' : 'ATG',
                        'R' : 'CGT',
                        ...,
                        '-' : '---'}) # Your missing character

peptide = 'M-R'
result = peptide.translate(codons)

and then translate the full sequence.

1
hello_there_andy On

In the end I made my own Python function, thought I may as well share it.

It takes an aligned peptide sequence with gaps and the corresponding un-aligned nucleotide sequence and gives an aligned nucleotide sequence:

Function

def gapsFromPeptide( peptide_seq, nucleotide_seq ):
    """ Transfers gaps from aligned peptide seq into codon partitioned nucleotide seq (codon alignment) 
          - peptide_seq is an aligned peptide sequence with gaps that need to be transferred to nucleotide seq
          - nucleotide_seq is an un-aligned dna sequence whose codons translate to peptide seq"""
    def chunks(l, n):
        """ Yield successive n-sized chunks from l."""
        for i in xrange(0, len(l), n):
            yield l[i:i+n]
    codons = [codon for codon in chunks(nucleotide_seq,3)]  #splits nucleotides into codons (triplets) 
    gappedCodons = []
    codonCount = 0
    for aa in peptide_seq:  #adds '---' gaps to nucleotide seq corresponding to peptide
        if aa!='-':
            gappedCodons.append(codons[codonCount])
            codonCount += 1
        else:
            gappedCodons.append('---')
    return(''.join(gappedCodons))

Usage

>>> unaligned_dna_seq = 'ATGATGATG'
>>> aligned_peptide_seq = 'M-MM'
>>> aligned_dna_seq = gapsFromPeptide(aligned_peptide_seq, unaligned_dna_seq)
>>> print(aligned_dna_seq)

    ATG---ATGATG
0
Ruben On

I understand you've asked this question three years ago, but this post is the first thing I find with my google search 'codon alignment python'. Therefore, I wanted to respond to this for everyone that might stumble upon this still looking for a library to do this.

You can use the library PyCogent for this.

They explain it well on their website: http://pycogent.org/examples/align_codons_to_protein.html