I need to parse a simple language that I didn't design, so I can't change the language. I need the results in C#, so I've been using TinyPG because it's so easy to use, and doesn't require external libraries to run the parser.
Things had been going pretty well, until I ran into this construct in the language. (This is a simplified version, but it does show the problem):
EOF -> @"^\s*$";
[Skip] WHITESPACE -> @"\s+";
LIST -> "LIST";
END -> "END";
IDENTIFIER -> @"[a-zA-Z_][a-zA-Z0-9_]*";
Expr -> LIST IDENTIFIER+ END;
Start -> (Expr)+ EOF;
The resulting parser cannot parse this:
LIST foo BAR Baz END
because it greedily lexes END as an IDENTIFIER
, instead of properly as the END
keyword.
So, Here are my questions:
Is this grammar ambiguous or wrong for LL(1) parsing? Or is this a bug in TinyPG?
Is there any way to redesign the grammar such that
TinyPG
will properly parse the example line?Are there any other suggestions for a simple parser that outputs code in
C#
and doesn't require additional libraries? I've looked atLLLPG
andANTLR4
, but found them much more troublesome thanTinyPG
.
You might be the same guy since the issue looks identical, as the one I answered on GitHub, but here it is again for people who google this issue.
Here is an example from the Simple-CIL-compiler project, The identifier has to catch single words except the ones listed, which means you have to include the exception token's in to the identifier
Hope that helps.
(Link to Original post)