How to exclude first group's single character from being matched into second group?

2.9k views Asked by At

I would like to build q regular expression that matches patterns of repeated single characters, followed by each other. For example three times the same character 'A' followed by two times another character 'B'. It doesn't matter if the second group's character is repeated more than two times. For instance it should match the string wuzDDDFFFxji

Full match  3-8 `DDDFF`
Group 1.    3-4 `D`
Group 2.    6-7 `F`

I've come up with the following regular expression but there's one limitation.

(.)\1{2}(.)\2{1}

It almost works but it will not exclude the first group's character from being matched in the second group. The string qwuiuQQQQQsas will be matched since:

Full match  5-10    `QQQQQ`
Group 1.    5-6 `Q`
Group 2.    8-9 `Q`

This doesn't match what I want but I couldn't find the correct syntax to exclude a specific group from being matched in another one. My closest attempt doesn't seem to work

(.)\1{2}((?:\1))\2{1}


1st Capturing Group (.)
. matches any character (except for line terminators)
\1{2} matches the same text as most recently matched by the 1st capturing group
{2} Quantifier — Matches exactly 2 times
2nd Capturing Group ((?:\1))
Non-capturing group (?:\1)
\1 matches the same text as most recently matched by the 1st capturing group
\2{1} matches the same text as most recently matched by the 2nd capturing group
{1} Quantifier — Matches exactly one time (meaningless quantifier)

Any hint here? Thank you so much!

2

There are 2 answers

2
Wiktor Stribiżew On BEST ANSWER

To avoid matching qwuiuQQQQQsas you need to use a negative lookahead rather than a non-capturing group:

(.)\1{2}((?!\1).)\2
         ^^^^^^

See the regex demo.

The (?!\1) negative lookahead will "restrict" the . pattern to only match characters other than those matched into Group 1.

Non-capturing groups do not restrict any patterns, but are used to just group subpatterns that still consume text, and lookaheads (zero-width assertions) do not consume text and only check if the text meeting there pattern is present in the string or not.

0
Dmitry Egorov On

I would suggest using "\1 not followed by \1" pattern:

(.)\1+(?!\1)(.)\2+

Demo: https://regex101.com/r/QkqpzS/1