python script : sequence identifier and number of possible sequences

272 views Asked by At

I need to work with python for a school project, but I really don't know how to start at it.

The question is: A FASTA file contains a number of DNA sequences. Unfortunately, some of the symbols are ambiguous. The encoding is IUPAC (http://www.bioinformatics.org/sms/iupac.html). Write a Python script that, given the name of the FASTA file, writes the sequence identifier and the number of possible sequences for each sequence in the file. Example: for the—very short—sequence “AYGH” the number of possible sequences would be 6.

1

There are 1 answers

0
Biopy On

Try with a dictionnary like this :

nucleotides = {'A':['A'], 'C':['C'], 'G':['G'], 'T':['T'], 'U':['U'], 'R':['A','G'], 'Y':['C','T'], 'S':['G','C'], 'W':['A','T'], 'K':['G','T'], 'M':['A','C'], 'B':['C','G','T'], 'D':['A','G','T'], 'H':['A','C','T'], 'V':['A','C','G'], 'N':['A','C','G','T'], '-':['-'], '.':['-']}

Then loop on each possibilities oh each nucleotide of your main sequence.