I'm writing a pest grammar for the fountain.io syntax, which includes two different varieties of comment-like elements: [[ notes ]] set off with double-square brackets and /* boneyard */ elements delimited with a C-style comment syntax.
I'm trying to satisfy the test cases for Boneyards and Notes from the reference implementation, and I'm running into problems with the internal cases.
By way of example:
file = { ( boneyard | note | generic_line | blank_line)+ ~ EOI }
// Basics
ws = _{ (" " | "\t") }
start = _{ SOI | NEWLINE }
bn_open = { ("[[" | "/*") }
blank_line = { NEWLINE }
head = { !(NEWLINE | ws | bn_open) ~ ANY }
tail = { !(NEWLINE | bn_open) ~ ANY }
text = { bn? ~ head ~ bn? ~ (tail ~ bn?)* }
generic_line = { start ~ text ~ &NEWLINE }
// Boneyards & Notes
bn = { boneyard | note }
boneyard = { "/*" ~ boneyard_txt ~ "*/" }
boneyard_txt = @{ (boneyard | !"*/" ~ ANY)* }
note_txt = @{ (note | !"]]" ~ ANY)* }
note = { "[[" ~ note_txt ~ "]]" }
This almost does what I want, but as you can see in the pest editor, it splits each text character and makes the output very noisy:
- file
- generic_line > text
- head: "A"
- tail: " "
- tail: "l"
- tail: "i"
- tail: "n"
- tail: "e"
- tail: "."
- blank_line: "\n"
- blank_line: "\n"
- note > note_txt: "A note."
- blank_line: "\n"
- blank_line: "\n"
- note > note_txt: "This note spans\n multiple lines."
- blank_line: "\n"
- generic_line > text
- head: "T"
- tail: "h"
- tail: "i"
- tail: "s"
- tail: " "
- tail: "i"
- tail: "s"
- tail: " "
- tail: "a"
- tail: "n"
- tail: " "
- bn > note > note_txt: "internal"
- tail: " "
- tail: "n"
- tail: "o"
- tail: "t"
- tail: "e"
- tail: "."
- blank_line: "\n"
- EOI: ""
If I make the text rule atomic, text = @{ bn? ~ head ~ bn? ~ (tail ~ bn?)* } then the results are cleaner and more readable, closer to how I'd like to actually use them:
- file
- generic_line > text: "A line."
- blank_line: "\n"
- blank_line: "\n"
- note > note_txt: "A note."
- blank_line: "\n"
- blank_line: "\n"
- note > note_txt: "This note spans\n multiple lines."
- blank_line: "\n"
- generic_line > text: "This is an [[internal]] note."
- blank_line: "\n"
- EOI: ""
But sadly that causes the [[ internal ]] notes and boneyards to be miscategorized as generic text lines. I also tried making text a compound-atomic ($) rule, but that didn't make any difference from a non-atomic rule in this case.
Does anyone have any suggestions here? Do I have any options besides taking the character-by-character output and concatenating them all in application code?
After a few attempts, I think I've got an result that's pretty close to what I wanted. This grammar...
Produces output like...
The trick here is making
textinto a container that dispatches between eithertext_fragmentorboneyard/notescomment types. Sotext_fragmentbecomes the atomic rule, but if we make it silent by prefixing with _, then the output is reasonably clean.Pest editor link