Say I had a simple language to parse in nearley that's just made of strings. "this is a string"
string -> "\"" chars "\""
However, that string can contain a code within curly braces. To keep things simple let's just say the code
can only be another string."this is a string with {"code"}"
code -> "{" string "}"
How do I define the new string in Nearley to include the code
definition? I keep ending up with a huge number of results as chars
can match one or more characters.
string -> "\"" charCode "\""
charCode -> (chars | code) charCode
| (chars | code)
code -> "{" string "}"
chars -> char chars
| char
char -> [^{}]
Ideally I'd be able to turn something like this "chars {"code"} chars chars {"code"} chars"
into an array ["chars ", "code", " chars chars ", "code", " chars"]
Perhaps it's only possible to do this using regex and moo as suggested in this answer? (The opening and closing tags are less ambiguous in this example, and I'm not experiencing the same issues.) [Nearley]: how to parse matching opening and closing tag
I'd use a regex-based lexer, certainly. But you could try to write an unambiguous grammar, based on the observation that you can never have two adjacent
chars
in acharCode
:Another possibility, using EBNF:
You'll probably have to play with that a bit to get it right. I don't use nearley much.