I made a python script for downloading protein sequences from Uniprot in fasta format. The script will read the accession numbers from a text file containing the accession numbers (one on each line) and then try to download the respective sequence from UniProt database. Here is the script:
import time
begin = round (time.time(),1)
import requests
#Change the name of file containing Uniprot accession IDs
with open ('AFdbIDs-324Seqs.txt', 'r') as infile:
lines = infile.readlines()
listfile_name = infile.name
file_name = listfile_name.split('.', 1)[0]
#print (file_name)
count = 0
with open ((file_name)+'_sequences.fa', 'wb') as txtfile:
for line in lines:
count+=1
line = line.strip()
access_id = line
url_part1 = 'https://rest.uniprot.org/uniprotkb/'
url_part2 = '.fasta'
#get the sequences from the url
URL = url_part1+access_id+url_part2
response = requests.get (URL)
txtfile.write(response.content)
print ("Total sequences downloaded = ", count)
time.sleep(1)
end = round (time.time(),1)
print(f"Time taken = {end - begin} seconds")
This python script works fine. However, during downloading 100s of sequences in a file, it takes quite some time and it feels like there is nothing happening and infact, once it got stuck due to network issues, I think. I would like to add a progress bar showing how much percent of the download is complete. I found some solutions but I couldn't include this in my script, always receiving errors. I did install 'Clint' package but also coudn't properly include it the code to work. I am just a beginnner so many things I don't get easily. So, some easy solutions would be much appreciated. Thanks