Is there a way to do context sensitive parsing in tatsu

98 views Asked by At

context sensitive '%' ..... eol comments

I'm starting with the grammar for PDF described here

https://github.com/caradoc-org/caradoc/blob/master/doc/grammar/grammar.pdf

which seems to lack the definition of eol comments.

PDF has end of line comments which start with the '%' character except inside string_literal (and another rule stream).

string_literal = "(" string_content ")";

where string_content can include the '%' character and also eol, but not "()" etc. The PDF language also has some special cases which otherwise look like comments eg

'%PDF-1.5' eol;

or

"%%EOF" [eol];

is there a way to handle the context sensitivity in a tatsu grammar?

1

There are 1 answers

0
Apalala On

I'll stay away from "Context Sensitive" in this answer, because the phrase has meaning in Language Theory.

PEG is perfectly capable of parsing a sub-language (say, Python string formatting expressions) within another language.

In fact, the original PEG definition does not use a tokenizer, because PEG grammars can parse the token sub-language.

If you think of sub-grammars, then the context is provided by the rule that knows that a sub-grammar has to be invoked.

With TatSu, there are features that allow tokenization to happen before the parsing (the Buffer class) for efficiency, and convenience, but using those features is not mandatory.

The only cases that cannot be handled easily as a grammar-within-a-grammar are preprocessing with macro capabilities, because those require an interpretation phase before the text for the inner grammar can be parsed.