TatSu tends to ignore the square bracket characters, be it [
, ]
, and the mix of two at times and recognize them at different times for some reason, which I will show in an example below I'm experimenting with in TatSu 5.10.1, Python 3.11.6, Linux 6.5.7 if it is related in any way.
I aim to render a subset of Markdown, but I'll start with a simplified grammar to discuss the issue.
(I'm using a unit separator as a rare character since other ways to disable whitespace handling were more confusing. If there's a more straightforward and reliable way to tell TatSu to recognize the whitespace as characters it should treat as a part of the text, that'll be useful to know, too.)
@@grammar::Markdown
@@whitespace :: /[␟]/
start = pieces $ ;
text = text:/[a-z]+/ ;
pieces = {text}*
;
This test code leads TatSu to ignore the []
and not fail with an error.
If I set the markdown_str as something else, like () or {}, TatSu will fail.
Individual square brackets, [ or ], won't lead to an exception.
import tatsu
with open("./grammar.txt", "r") as grammar_file:
grammar = grammar_file.read()
class MarkdownSemantics:
def pieces(self, ast):
return ''.join(ast)
parser = tatsu.compile(grammar)
markdown_str = "[]"
ast = parser.parse(markdown_str, semantics=MarkdownSemantics())
print(ast)
I expect this to be a bug, as I don't see what's so special about the square bracket characters. They are not defined as a part of whitespace to be ignored, and other characters similar to them are.
At the same time, I am told here that it's about learning parsing principles. Is my EBNF above allowing [
or ]
to pass?
Your example code does not work, the semantics class definition expects the argument to
pieces()
to be a list of strings, but it is not.Anyhow, the issue is with your
whitespace
definition. Contrary to what the documentation says, the@@whitespace
directive in the grammar definition is interpreted as a list of characters to skip over between tokens (at least this is how I read the TatSu source code). Therefore, your grammar definition skips over[
and]
.To disable white space handling, you can assign
None
orFalse
to the@@whitespace
directive: