I have a problem in my EBNF and Tatsu implementation extract grammar EBNF for Tatsu :
define ='#define' constantename [constante] ;
constante = CONSTANTE ;
CONSTANTE = ( Any | ``true`` ) ;
Any = /.*/ ;
constantename = (/[A-Z0-9_()]*/) ;
When I test with :
#define _TEST01_ "test01"
#define _TEST_
#define _TEST02_ "test02"
I get :
[
"#define",
"_TEST01_",
"\"test01\""
],
[
"#define",
"_TEST_",
"#define _TEST02_ \"test02\""
]
But I want this :
[
"#define",
"_TEST01_",
"\"test01\""
],
[
"#define",
"_TEST_",
"true"
],
[
"#define",
"_TEST02_",
"\"test02\""
]
Where is my mistake ?
Thanks a lot...
The problem is that Tatsu skips white space, including newlines, between elements by default. So when you apply the rule
'#define' constantename [constante]
to the input:It first matches
#define
with'#define'
, then skips the space, then matches_TEST_
withconstantename
, then skips the newline, and then matches#define _TEST02_ "test02"
withANY
(viaconstante
).Note that that's exactly the behaviour you'd want (I assume) if the newline weren't there:
Here you'd want the output
["#define", "_TEST_", "#define _TEST02_ \"test02\""]
, right? At least the C preprocessor would handle it the same way in that case.So what that tells us is that the newline is significant. Therefore you can't ignore it. You can tell Tatsu to only ignore tabs and spaces (not newlines) either by passing
whitespace = '\t '
as an option when creating the parser, or by adding this line to the grammar:Now you'll need to explicitly mention newlines anywhere where newlines should go, so your rule becomes:
Now it's clear that the constant, if present, should appear before the line break, so for the line
#define _TEST_
, it would realize that there is no constant.Note that you'll also want a rule to match empty lines, so empty lines aren't syntax errors.