How to use {} in regex pattern with findall + Python

899 views Asked by At

I'm creating a regex as below:

import re
asd = re.compile(r"(blah){2}")
mo = asd.search("blahblahblahblahblahblah ll2l 21HeHeHeHeHeHe lllo")
mo1 = asd.findall("blahblahblahblahblahblah")
print(mo.group())
print("findall output: ", mo1)

This returns output blahblah findall output: ['blah', 'blah', 'blah']

-Why findall output matches 'blah' three times, when its specified {2} times only in the pattern?

If I change to {4}, then findall matches:

asd = re.compile(r"(blah){4}")
findall output:  ['blah']

-How is {m} treated with re.search and re.findall ?

Thanks a lot.

2

There are 2 answers

0
Dekel On

If you want to catch the (blah){2} (the 2 blah you have there) you should wrap it:

asd = re.compile(r"((?:blah){2})")

Note that I made sure not to catch the inside blah (using ?:)

>>>asd = re.compile(r"((?:blah){2})")
>>>mo = asd.search("blahblahblahblahblahblah ll2l 21HeHeHeHeHeHe lllo")
>>>mo1 = asd.findall("blahblahblahblahblahblah")
>>>print(mo.group())
blahblah
>>>print("findall output: ", mo1)
findall output:  ['blahblah', 'blahblah', 'blahblah']

Exactly the same goes with the {4} you have there. The regex will find it, but will not catch it. if you want to catch it you should wrap it.

0
ryugie On

(blah){2} captures and exhausts the string blahblah but only returns the last blah in blahblah. Since you have three blahblahs in your string, it will output ['blah', 'blah', 'blah']

(blah){4} can only match once so it gives you ['blah']