I am working with strings of non-latin characters. I want to match strings with reduplication patterns, such as AAB, ABB, ABAB, etc. I tried out the following code:
import re
patternAAB = re.compile(r'\b(\w)\1\w\b')
match = patternAAB.findall(rawtext)
print(match)
However, it reurns only the first character of the matched string. I know this happens because of the capturing parenthesis around the first \w.
I tried to add capturing parenthesis around the whole matched block, but Python gives
error: cannot refer to an open group at position 7
I also found this method,but didn't work for me:
patternAAB = re.compile(r'\b(\w)\1\w\b')
match = patternAAB.search(rawtext)
if match:
print(match.group(1))
How could I match the pattern and return the whole matching string?
# Ex. 哈哈笑
# string matches AAB pattern so my code returns 哈
# but not the entire string
The message:
is telling you that
\1refers to the group with parentheses all around, because its opening parenthesis comes first. The group you want to backreference is number 2, so this code works:Each item in
matchhas both groups: