Everything I've read suggests Treetop backtracks like regular expressions, but I'm having a hard time making that work.
Suppose I have the following grammar:
grammar TestGrammar
rule open_close
'{' .+ '}'
end
end
This does not match the string {abc}
. I suspect that's because the .+
is consuming everything from the letter a
onwards. I.e. it's consuming abc}
when I only want it to consume abc
.
This appears different from what a similar regex does. The regex /{.+}/
will match {abc}
. It's my understanding that this is possible because the regex engine backtracks after consuming the closing }
as part of the .+
and then failing to match.
So can Treetop do backtracking like that? If so, how?
I know you can use negation to match "anything other than a }
." But that's not my intention. Suppose I want to be able to match the string {ab}c}
. The tokens I want in that case are the opening {
, a middle string of ab}c
, and the closing }
. This is a contrived example, but it becomes very relevant when working with nested expressions like {a b {c d}}
.
Treetop is an implementation of a Parsing Expression Grammar parser. One of the benefits of PEGs is their combination of flexibility, speed, and memory requirements. However, this balancing act has some tradeoffs.
Quoting from the Wikipedia article:
(Emphasis mine.)
In short: while certain PEG operators can backtrack in an attempt to take another route, the
+
operator cannot.Instead, in order to match nested sub-expressions, you want to create an alternation between the delimited sub-expression (checked first) followed by the non-expression characters. Something like (untested):