I am trying to look for multiple specific sequences in a DNA sequence within a FASTA format and then print them out. For simplicity, I made a short string sequence to show my problem.
import re
seq = "QPPLSK"
find_in_seq = re.search(r"[^P](P|K|R|H|W)", seq)
print find_in_seq.string[find_in_seq.start():find_in_seq.end()]
I only get one output of a match "QP" when there are 2 matches "QP" and "SK". How do I get to show the 2 matches instead of just only showing the first match?
Thanks
Use
re.findall
and change the regex so that there is no more capturing group -[^P](?:P|K|R|H|W)
or[^P][PKRHW]
:See the Python demo
Note that if you want to match any letter other than
P
, you'd better use[A-OQ-Z]
.