I am trying to generate a parser in JavaScript via Jison for the language ChucK, and have got off to a good start except that there are ambiguities in the language which the generated parser is unable to handle. The original ChucK compiler is generated by Bison, and that must somehow be able to resolve these ambiguities.
For the purposes of this question I've simplified the problem to a construed grammar which presents only one ambiguity. For reference I've put up a gist of all the involved files (including the generated parser). The project structure is as follows:
- language/lexer.js: The lexer.
- language/grammar.js: The grammar definition as well as a function (
generate
) to generate the parser via Jison. - language/helpers.js: Helper functions
- src/parser.js: The generated parser.
- testparse.js: A program that tests the parser with the following source code:
Type var => out;
.
The grammar itself looks as follows:
grammar = {
Program: [
['ProgramSection', '$$ = new yy.Program($1);']
],
ProgramSection: [
['Expression SEMICOLON', '$$ = new yy.ExpressionStatement($1);']
],
Expression: [
['DeclExpression', '$$ = $1;'],
['Expression OP DeclExpression', '$$ = new yy.ExpFromBinary($1, $2, $3);']
],
DeclExpression: [
['TypeDecl VarDeclList', '$$ = new yy.DeclExp($1, $2, 0);'],
['PrimaryExpression', '$$ = $1;']
],
VarDeclList: [
['VarDecl', '$$ = new yy.VarDeclList($1);']
],
VarDecl: [
['ID', '$$ = new yy.VarDecl($1);']
],
TypeDecl: [
['ID', '$$ = new yy.TypeDecl(new yy.IdList($1), 0);']
],
PrimaryExpression: [
['ID', '$$ = new yy.ExpFromId($1);']
]
};
The ambiguity is that the non-terminal DeclExpression can match either TypeDecl VarDeclList
or PrimaryExpression
. This makes Jison emit the following warning:
States with conflicts:
State 7
TypeDecl -> ID . #lookaheads= ID SEMICOLON OP
PrimaryExpression -> ID . #lookaheads= ID SEMICOLON OP
And the generated parser fails to parse the test code (Type var => out;
) like so:
Error: Parse error on line 1: Unexpected 'SEMICOLON'
To my understanding, it's the part after the =>
operator that the parser tries to match against the rule TypeDecl VarDeclList
.
So, how can I generate a parser that is able to deal with this ambiguity?
I've found that I can produce a functional parser for this (simplified) grammar by choosing either the 'slr' (SLR) or the 'lr' (LR1) parser type:
I would still like to know however why the default (LALR(1)) won't work, as this should be what Bison generates.