Is it possible to use a different lexer?

208 views Asked by At

I would like to use a different lexer for tatsu, yet use tatsu's parser. Is this possible? For example, in the grammar:

expr = NUM | ID | (expr '+' expr) ;

is it possible to use an alternative lexer to provide NUM and ID?

2

There are 2 answers

0
Apalala On

In general, PEG parsers don't use a separate lexer because they don't need one. Lexical elements can be specified using the same grammar language.

TatSu, a PEG parser generator, doesn't support separate lexers either, yet the Buffer class provides facilities for avoiding partial matches of literal tokens and for specifying lexical elements using regular expressions:

expr = num | id | (expr '+' expr) ;
num = /\d+/ ;
id = /[a-zA-Z_]\w*/ ;
0
Apalala On

Recent versions of TatSu allow the use of a different lexer (called Tokenizer in Tatsu).

The parser will probably have to rely on having semantic actions verity the grammar rules that correspond to tokens.

There are some unfinished experiments from my work helping with the Python PEG parser at https://github.com/neogeny/pygl.