Does this algorithm correctly transform lookahead to lookbehind assertions in regex?

147 views Asked by At

I read this article http://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html which discusses pcre and it's relation to the chomsky hierarchy.

He discusses context sensitive grammar's and makes the point that pcre definitely supports some context sensitive languages, but it's unclear whether it supports all context sensitive languages. He continues to offer

(?<A> (?<= α ) γ (?= β ) )

as a possible transliteration of the context sensitive grammar rule into pcre, but notes that lookbehinds do not support variable widths. The implication seems to be, had pcre supported variable width lookbehinds, then pcre could encode all context sensitive languages.

*disclaimer: I am a regex amateur.

This made me wonder if lookbehinds could be transformed into lookaheads (as presumably lookaheads do not suffer from this fixed width limitation).

Then I wrote this regex as a possible alternative implementation of a context sensitive grammar,

(?(?= a) (?: a)(?<A> γ)(?= β))

As for a more concrete case I tried,

(?(?=ab+)(?:ab+)(?<A>[def]{1,5})(?=xyz))

As well as,

(?(?=ab+)(?:ab+)([def]{1,5})(?=xyz))

As it wasn't immediately clear to me if there was a requirement about how the ``A'' capture took place (does it need to be the outermost 'grouping'?).

Both of these regex match strings like 'abbdefxyz' but not strings like 'abbadefxyz' and 'zdefxyz'.

0

There are 0 answers