How can I upload multiple sequences to BLAST using Biopython?

2.9k views Asked by At

I am trying to run BLASTN searches of multiple sequences from a single FASTA file. I can easily query a single sequence from a file but am struggling to query all the sequences in one file. As these are relatively short reads, I would rather not split the file into individual sequences and query each one separately.

This is what I have tried so far:

from Bio import SeqIO
from Bio.Blast import NCBIWWW

f_iterator = SeqIO.parse("file.fasta", "fasta")
f_record = f_iterator.next()
result_handle = NCBIWWW.qblast("blastn", "nt", f_record)
save_result = open("blast_result.xml", "w")
save_result.write(result_handle.read())
save_result.close()
result_handle.close()

Does anybody have any ideas?

2

There are 2 answers

0
mikhael On

Can't you give simply the whole content of a multiple sequence fasta file (read straight form the file) instead of single records?

    from Bio.Blast import NCBIWWW

    with open("file.fasta", "r") as fasta_file:
        sequences = fasta_file.read()
        fasta_file.close()

    result_handle = NCBIWWW.qblast("blastn", "nt", sequences)
    save_result = open("blast_result.xml", "w")
    save_result.write(result_handle.read())
    save_result.close()
    result_handle.close()
1
Sweasonable Doubt On

You can just use an open/read if your file is in FASTA format already. This is taken straight from the Biopython cookbook.

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc92

fasta_string = open("m_cold.fasta").read()

I run a simple script like this all the time:

from Bio.Blast import NCBIWWW

fasta_string = open("file.fasta").read()

result_handle = qblast(
"blastn",
"nt",
fasta_string,
)
save_file = open("out.xml", "w")

save_file.write(result_handle.read())

save_file.close()

result_handle.close()

If that doesn't work, check to make sure your FASTA format is correct. A converter is available here.

https://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html