Retrieving DNA sequences from a database of protein sequences?

358 views Asked by At

I have 1000's of protein sequences in FASTA and their accession numbers. I want to go back into the whole genome shotgun database and retrieve all DNA sequences that encode for a protein identical to one in my list of initial sequences.

I've tried running a tBlastn with <10 results for each sequence, 1 per query and e-value below 1e-100 or with an e-value of zero and I'm not getting any results. I would like to automate this entire process.

Is this something that can be done by running blast from the command line and a batch script?

2

There are 2 answers

0
Hugues Fontenelle On

You should get at least one result: the one that encodes for the original protein. The others, if any, would be pseudogenes, if I follow you.

Anyway, a bit of programming may help help, check out Biopython. Bioperl or Bioruby should have similar features. In particular you can BLAST using Biopython

0
Fatt On

You might find this link useful:

https://www.biostars.org/p/5403/

A similar question has been asked there, and some reasonable solutions have been posted.