Retrieving DNA sequences from a database of protein sequences?

373 views Asked by At

I have 1000's of protein sequences in FASTA and their accession numbers. I want to go back into the whole genome shotgun database and retrieve all DNA sequences that encode for a protein identical to one in my list of initial sequences.

I've tried running a tBlastn with <10 results for each sequence, 1 per query and e-value below 1e-100 or with an e-value of zero and I'm not getting any results. I would like to automate this entire process.

Is this something that can be done by running blast from the command line and a batch script?


There are 2 answers

Hugues Fontenelle On

You should get at least one result: the one that encodes for the original protein. The others, if any, would be pseudogenes, if I follow you.

Anyway, a bit of programming may help help, check out Biopython. Bioperl or Bioruby should have similar features. In particular you can BLAST using Biopython

Fatt On

You might find this link useful:

A similar question has been asked there, and some reasonable solutions have been posted.