I'm using Python 3.4.
Suppose we have four groups composed of regular expressions
g1 = 'g11|g22|...|g1m'
g2 = 'g21|g22|...|g2n'
g3 = 'g32|g32|...|g3p'
g4 = 'g41|g42|...|g4q'
For example, g1
might be 'chickens|horses|bonnet(?>!blue )'
. The groups are disjoint: no element in any of the four groups belongs to more than one group. The groups can have any number of elements greater than 1.
I want to match on a string if and only if it contains any instance of group_1 such that either :
- no instances of any of groups 1-4 precede said instance of group_1 or
- the instance of any of groups 1-4 that immediately precedes said instance of group_1 is not group_2.
Some strings on which I want to match:
'g11'
'g31 g11'
'g41g11'
'g11 g21 g11'
(The second instance of g11 violates rule 2. The first instance of g11 does not and moreover rule 1 is satisfied.)'anything or nothing g11 anything or nothing'
'anything or nothing g31 anything or nothing g11'
Some strings on which I don't want to match:
'g31 g21 g11'
'g21 g11 g31'
'anything or nothing g21 anything or nothing g11 anything or nothing'
What've tried so far:
I tried:
(g31|g32)(?=.*?(g11|g12))(?!.*?(g21|g22))
, which works for'g31 g11'
and'g31 g21 g11'
but fails if there is a g21 or g22 after g11, as in'g31 g11 g21'
.I've also tried
'(g31|g32).*?(g21|g22){0}.*?(g11|g22)'
which works for'g31 g11'
and'g31 g21 g11'
but not'g31 g31 g21 g11'
.
You can try this.See demo.
https://regex101.com/r/hI0qP0/16