How can I generate a parser with Jison which deals with grammar ambiguity?

1.4k views Asked by At

I am trying to generate a parser in JavaScript via Jison for the language ChucK, and have got off to a good start except that there are ambiguities in the language which the generated parser is unable to handle. The original ChucK compiler is generated by Bison, and that must somehow be able to resolve these ambiguities.

For the purposes of this question I've simplified the problem to a construed grammar which presents only one ambiguity. For reference I've put up a gist of all the involved files (including the generated parser). The project structure is as follows:

The grammar itself looks as follows:

grammar = {
    Program: [
        ['ProgramSection', '$$ = new yy.Program($1);']
    ],
    ProgramSection: [
        ['Expression SEMICOLON', '$$ = new yy.ExpressionStatement($1);']
    ],
    Expression: [
        ['DeclExpression', '$$ = $1;'],
        ['Expression OP DeclExpression', '$$ = new yy.ExpFromBinary($1, $2, $3);']
    ],
    DeclExpression: [
        ['TypeDecl VarDeclList', '$$ = new yy.DeclExp($1, $2, 0);'],
        ['PrimaryExpression', '$$ = $1;']
    ],
    VarDeclList: [
        ['VarDecl', '$$ = new yy.VarDeclList($1);']
    ],
    VarDecl: [
        ['ID', '$$ = new yy.VarDecl($1);']
    ],
    TypeDecl: [
        ['ID', '$$ = new yy.TypeDecl(new yy.IdList($1), 0);']
    ],
    PrimaryExpression: [
        ['ID', '$$ = new yy.ExpFromId($1);']
    ]
};

The ambiguity is that the non-terminal DeclExpression can match either TypeDecl VarDeclList or PrimaryExpression. This makes Jison emit the following warning:

States with conflicts:
State 7
  TypeDecl -> ID . #lookaheads= ID SEMICOLON OP
  PrimaryExpression -> ID . #lookaheads= ID SEMICOLON OP

And the generated parser fails to parse the test code (Type var => out;) like so:

Error: Parse error on line 1: Unexpected 'SEMICOLON'

To my understanding, it's the part after the => operator that the parser tries to match against the rule TypeDecl VarDeclList.

So, how can I generate a parser that is able to deal with this ambiguity?

2

There are 2 answers

0
aknuds1 On BEST ANSWER

I've found that I can produce a functional parser for this (simplified) grammar by choosing either the 'slr' (SLR) or the 'lr' (LR1) parser type:

// Generate SLR parser, since default LALR has conflicts
exports.generate = new Parser(parserConfig, {type: "slr"}).generate;

I would still like to know however why the default (LALR(1)) won't work, as this should be what Bison generates.

0
blackshade On

The reason your grammer doesn't work with a LALR(1) parser is because your is ambiguous for a LALR(1) parser in the DeclExpression state on the TypeDecl and PrimaryExpression states.

Let me try to explain. As stated by the error message the parser detects a conflict on TypeDecl and PrimaryExpression. Both have ID as token but since the LALR(1) parser can only look one token ahead it means that the parser doesn't know what to do when it is in the DeclExpression state. SLR on the other hand has a kind of dynamic look-a-head which will resolve the conflict at the expense of some memory.

If you want to make it work on a LALR(1) parser, just refactor your DeclExpression rule to be like ID VarDeclList | ID, this way the parser doesn't have to look-a-head in order to find the correct rule.