I'm working on a tool for manipulating context-free languages, and the internal representation of a grammar is stored as a finite automaton. Looking farther into EBNF and RegEx, I learned that EBNF has "exceptions" and RegEx has negative lookahead. I can see how these might be modeled by a symmetric difference NFA, but had suspected they are beyond the capabilities of a regular DFA or NFA.
But then I came across this which pretty plainly suggests that I was wrong. Can anyone point to a free resource that might show how to convert an EBNF with exceptions, or a RegEx with negative lookahead, into a DFA?
You can replace a negative lookahead with a positive lookahead on the full set of possible matches minus the negative lookahead match. E.g. if we were working with single characters a-z as the match space,
/what(?!n)/
is the same as/what(?=[a-mo-z]|$)/
(the$
is needed as end-of-string is one of the possible matches). This is OK in theory but not so great in practice when dealing with larger matches, like/afraid of (?!chinese)/
.https://cs.stackexchange.com/questions/2557/how-to-simulate-backreferences-lookaheads-and-lookbehinds-in-finite-state-auto gives a good treatment of converting lookaheads into something DFA-like, with an important caveat at the end.