Blast+ Local Configuration: How to configure nt and nr databases?

2.4k views Asked by At

I am configuring Blast+ on my mac (os sierra) and am having trouble configuring my nr and nt databases that I also downloaded locally. I am trying to follow NCBI's instructions here, and am getting hung up on the Configuration and Example Execution steps.

They say to change my .bash_profile so that it says:

export PATH=$PATH:$HOME/Documents/Luke/Research/Pedulla\ 17-18/blast/ncbi-blast-2.6.0+/bin

That works fine, and they say configure a path for BLASTDB "similarly" but to the file where my DB will be, so I have done this:

export BLASTDB=$BLASTDB:$HOME/Documents/Luke/Research/Pedulla\ 17-18/blast/blastdb/nt.00

which specifies the exact folder that I got when I unzipped the nt tar file from their FTP. With this path, if I run the command...

blastn -query test_query.fa -db nt.00 -task blastn -outfmt "7 qseqid sseqid evalue bitscore" -max_target_seqs 5

then it runs successfully and I get results, but I am worried that these are only being checked against the nt.00 section of the entire nt.00 database file, especially because if I run my test_query.fa sequence on the Web Blast, I get different results.

Also, their instructions say that the path only needs to point to the folder that contains the whole database folder nt.00, from the tar I unzipped--and not the specific nt.00 itself--, which in my case would just be "blastdb/" (As opposed to "blastdb/nt.00/" which then contains nt.00.nhd, nt.00.nal, etc.). That makes sense because when I am working I want to be able to blastn on the nt database but also blastp on the nr one, etc. by changing the -db flag on my command, and there shouldn't be a problem with having them all in this folder, right? But if I must specify the path for BLASTDB with the nt.00 DB added to the end, how could I ever use nr.00 in the same folder (blastdb/)? Essentially, I want to do as the instructions say, and just have this:

export BLASTDB=$BLASTDB:$HOME/Documents/Luke/Research/Pedulla\ 17-18/blast/blastdb/

And then depending on what database I want to use I could just say so after the -db flag on my command. But when I make the path like that above, it gives me this error:

BLAST Database error: No alias or index file found for nucleotide database [nt] in search path [/Users/LJStout::/Users/LJStout/Documents/Luke/Research/Pedulla 17-18/blast/blastdb:]

I have tried running that same blastn command from above and swapping out "nt" for "nt.00", and have tried these commands with the path for BLASTDB ending in both "blastdb/" and "blastdb/nt" and of course "blastdb/nt.00" which is the only one that runs without errors.

Here's an example of another thread I read where the OP is worried about his executions not checking the entire nt.00 folder, this was different than my problem however.

Thanks for you help!

1

There are 1 answers

0
ljs On BEST ANSWER

This whole problem came down to having the nt.00 & nr.00 files, the original folders that result from unzipping their respective .tar.gz's, in the same parent folder when it should be that their contents are in the same parent folder. I simply deleted the folders they came in and copied the contents over to my new, singular parent. I was kind of mislead by the instructions, it was a simple mistake. Now, I have one folder, blastdb/ that contains all of the contents of every database I plan on using, including nt,nr, and refseq.