I have spent way too much time on this now (+10 hours).
Input is a file in fasta-format. Output should be a text-file containing the gene-ID and the matched patterns (three different patterns)
I wanted to make my own function to avoid writing the same code three times, but I've given up now and just written it three times (and works fine).
Is there a way to use this:
records = list(SeqIO.parse('mytextfile.fasta', 'fasta'))
instead of the code that I'm currently using three times (down below) or some other function? It's for a school assignment so it shouldn't be too complicated either but I have to use the Bio and re-module to solve it.
from Bio import SeqIO
import re
outfile = 'sekvenser.txt'
for seq_record in SeqIO.parse('prot_sequences.fasta', 'fasta'):
match = re.findall(r'W.P', str(seq_record.seq), re.I)
if match:
with open(outfile, 'a') as f:
record_string = str(seq_record.id)
newmatch = str(match)
result = record_string+'\t'+newmatch
print(result)
f.write(result + '\n')
I've tried this
records = list(SeqIO.parse('prot_sequences.fasta', 'fasta'))
new_list = []
i = r'W.P'
for i in records:
match = re.findall(i)
if match:
new_list.append(match)
print(new_list)
But it only gives me that findall() is missing 1 required positional argument: 'string'.
As I can see it, i is a string (as I made the variable). Obviously I'm doing something wrong. If I try to insert seq_record that I'm using in my other code, it tells me that seq_record isn't defined. I don't understand what I'm supposed to put after the i in the code.
Input
prot_sequences.fasta:code :
output :
if you uncomment :
#print('rec : ', rec, rec.seq , type(rec.seq)you'll see that
rec.seqit's a<class 'Bio.Seq.Seq'>so not suitable as argument to be feed on re.findall(pattern, string, flags=0)does give you a list of string but you will loose the
rec.idneeded for your assignmentSee Bio.SeqIO.parse(handle, format, alphabet=None) to get a grasp of the Object properties that:
you could do:
to get :
but I am not sure it's the best way to do that: is actually very difficult to read, I am not sure is fast because (not sure again but it uses
re.findalltwice see Does Python automatically optimize/cache function calls?); so to get it maybe fastest but even more ugly , use:out:
as per the use of
list(), the ones below gives same result:res: