Retrieving DNA sequences from a database of protein sequences?

Question

Retrieving DNA sequences from a database of protein sequences?

364 views Asked by Andrew At 05 December 2014 at 16:51

I have 1000's of protein sequences in FASTA and their accession numbers. I want to go back into the whole genome shotgun database and retrieve all DNA sequences that encode for a protein identical to one in my list of initial sequences.

I've tried running a tBlastn with <10 results for each sequence, 1 per query and e-value below 1e-100 or with an e-value of zero and I'm not getting any results. I would like to automate this entire process.

Is this something that can be done by running blast from the command line and a batch script?

Original Q&A

There are 2 answers

**Hugues Fontenelle** · Answer 1 · 2014-12-08T09:37:45+00:00

You should get at least one result: the one that encodes for the original protein. The others, if any, would be pseudogenes, if I follow you.

Anyway, a bit of programming may help help, check out Biopython. Bioperl or Bioruby should have similar features. In particular you can BLAST using Biopython

**Fatt** · Answer 2 · 2014-12-09T21:35:15+00:00

Fatt On 09 December 2014 at 21:35

You might find this link useful:

https://www.biostars.org/p/5403/

A similar question has been asked there, and some reasonable solutions have been posted.

TechQA.

Retrieving DNA sequences from a database of protein sequences?

There are 2 answers

Related Questions in BIOINFORMATICS

Related Questions in FASTA

Related Questions in GENOME

Related Questions in PROTEIN-DATABASE

Popular Questions

Popular Tags

Trending Questions