I've got a problem with an ambiguous parse in insta. Here's the grammar:
(def yip-shape
(insta/parser
(str/join "\n"
["S = ( list-item | heading | text-block )*"
;; lists and that
"list-item = list-level <ws> anything"
"list-level = #' {0,3}\\*'"
;; headings
"heading = heading-level <ws> ( heading-keyword <ws> )? ( heading-date <ws> )? anything <eol?>"
"heading-level = #'#{1,6}'"
"heading-date = <'<'> #'[\\d-:]+' <'>'>"
"heading-keyword = 'TODO' | 'DONE'"
"text-block = anything*"
"anything = #'.+'"
"<eol> = '\\r'? '\\n'"
"<ws> = #'\\s+'"])))
The problem is with a heading like ## TODO Done - I can understand why the ambiguity exists, I'm just not sure of the best way to solve it. E.G
(insta/parses yip-shape "## TODO Done.")
Produces:
([:S [:text-block [:anything "## TODO Done."]]]
[:S [:heading [:heading-level "##"] [:anything "TODO Done."]]]
[:S [:heading [:heading-level "##"] [:heading-keyword "TODO"] [:anything "Done."]]])
The last of which is the result I'm looking for. How best to eliminate the ambiguity and narrow the result down to the last one in that list?
Grammars are for parsing structured data. If you take an otherwise-reasonable grammar and throw an "any old junk" rule into it, you will get a lot of parses that involve any old junk. The way to resolve the ambiguity is to be more stringent about what qualifies in your "anything" rule, or better yet to remove it entirely and instead actually parse the stuff that goes there.