Parsing an indentation-based syntax (like Python) with YECC

240 views Asked by At

I have the following piece of code:

case 1
of 2
    3
of 3
    4
5

That my custom tokenizer translates to:

Tokens: [{'case',1},
         {integer,1,1},
         {eol,1},
         {'of',1},
         {integer,1,2},
         {block,[{integer,1,3}]},
         {eol,1},
         {'of',1},
         {integer,1,3},
         {block,[{integer,1,4}]},
         {eol,1},
         {integer,1,5}]

But then I'm not being able to parse it with the following Yecc:

Nonterminals
    grammar
    statements statement
    case_def case_conditions condition.


Terminals
    eol block
    integer
    case of.


Rootsymbol grammar.


grammar -> statements : '$1'.


statements -> statement eol statements : ['$1'|'$3'].
statements -> statement : ['$1'].

statement -> case_def : '$1'.
statement -> integer : '$1'.


case_def -> 'case' integer case_conditions : ''.

case_conditions -> case_condition case_conditions : ['$1'|'$2'].
case_conditions -> case_condition : ['$1'].

case_condition -> eol 'of' integer block : ''.

It gives me the following output:

["syntax error before: ","5"]

Any help is really welcome, thanks.

1

There are 1 answers

1
tkowal On BEST ANSWER

I think in nonterminal list, you should have case_condition instead of condition.

Your custom scanner ignores the indentation. It has to emit tokens for INDENT and DEDENT. I've found example in yacc. Then you can change your grammar to use those tokens.

Your example generates shift/reduce conflict. Documentation says, that:

Shift/reduce conflicts are resolved in favor of shifting if there are no operator precedence declarations.

It means, that when your parser is at the place denoted by |

of 3
    4|
5

and sees new line, there might be two options:

  • this might be next case_condition so parser needs to keep going "shift" to read more
  • this might be end of statement so parser needs to treat everything before as statement and put it on stack or in short "reduce"

Because those conflicts are always resolved to shift, you can't put next statement after you started case! You need to change your grammar. Work with it until yecc:file/1 does not generate any warnings.

HINT: Indent everything inside case like this:

case 1
    of 2
        3
    of 3
        4
5

This way, you are clearly saying, that case is one statement and 5 is another. My knowledge about parsers got a little rusty, but I believe, that you wouldn't be able to write grammar that is able to differentiate between case_condition and statement with left to right parser, if you don't add indentation or some kind of terminator.