I have 1000's of protein sequences in FASTA and their accession numbers. I want to go back into the whole genome shotgun database and retrieve all DNA sequences that encode for a protein identical to one in my list of initial sequences.
I've tried running a tBlastn with <10 results for each sequence, 1 per query and e-value below 1e-100 or with an e-value of zero and I'm not getting any results. I would like to automate this entire process.
Is this something that can be done by running blast from the command line and a batch script?
You should get at least one result: the one that encodes for the original protein. The others, if any, would be pseudogenes, if I follow you.
Anyway, a bit of programming may help help, check out Biopython. Bioperl or Bioruby should have similar features. In particular you can BLAST using Biopython