I've created a compiler for a language which has the following grammer, as defined by ML-Yacc (Starting symbol is "program" which is defined at the bottom):
%nonassoc FUN VAR ASSIGN PLUSASSIGN MINUSASSIGN TIMESASSIGN DIVIDEASSIGN
%right ELSE
%left OR
%left AND
%nonassoc EQ NEQ GT LT GE LE
%left PLUS MINUS
%left TIMES DIVIDE
%left UNARY
%left LPAREN
%%
const: INT
| FLOAT
| BOOL
| STRING
ty: ID
| FUN LPAREN typeList RPAREN ARROW ty
typeList: typeList'
|
typeList': ty COMMA typeList'
| ty
exp: primaryExp
| callExp
| boolExp
| opExp
| assignExp
assignExp: ID ASSIGN exp
| ID PLUSASSIGN exp
| ID MINUSASSIGN exp
| ID TIMESASSIGN exp
| ID DIVIDEASSIGN exp
tyargs: LT typeList' GT
callExp: exp LPAREN expList RPAREN
boolExp: exp AND exp
| exp OR exp
opExp: ID PLUSPLUS
| ID MINUSMINUS
| PLUSPLUS ID
| MINUSMINUS ID
| exp PLUS exp
| exp MINUS exp
| MINUS exp %prec UNARY
| BANG exp %prec UNARY
| exp TIMES exp
| exp DIVIDE exp
| exp EQ exp
| exp NEQ exp
| exp GT exp
| exp LT exp
| exp GE exp
| exp LE exp
expList: expList'
|
expList': exp COMMA expList'
| exp
primaryExp: ID
| const
| lambdaExp
| LPAREN exp RPAREN
varDecl: ty ID ASSIGN exp
| ty ID
| VAR ID ASSIGN exp
expStat: exp SEMICOLON
| SEMICOLON
statList: stat statList
|
compoundStat: LBRACE statList RBRACE
selectionStat: IF LPAREN exp RPAREN stat ELSE stat
| IF LPAREN exp RPAREN stat
jumpStat: RETURN exp
| RETURN
| BREAK
iterationStat: WHILE LPAREN exp RPAREN stat
| FOR LPAREN SEMICOLON SEMICOLON RPAREN stat
| FOR LPAREN SEMICOLON SEMICOLON exp RPAREN stat
| FOR LPAREN SEMICOLON exp SEMICOLON RPAREN stat
| FOR LPAREN SEMICOLON exp SEMICOLON exp RPAREN stat
| FOR LPAREN varDecl SEMICOLON SEMICOLON RPAREN stat
| FOR LPAREN varDecl SEMICOLON SEMICOLON exp RPAREN stat
| FOR LPAREN varDecl SEMICOLON exp SEMICOLON RPAREN stat
| FOR LPAREN varDecl SEMICOLON exp SEMICOLON exp RPAREN stat
| FOR LPAREN exp SEMICOLON SEMICOLON RPAREN stat
| FOR LPAREN exp SEMICOLON SEMICOLON exp RPAREN stat
| FOR LPAREN exp SEMICOLON exp SEMICOLON RPAREN stat
| FOR LPAREN exp SEMICOLON exp SEMICOLON exp RPAREN stat
stat: expStat
| compoundStat
| selectionStat
| iterationStat
| jumpStat SEMICOLON
| varDecl SEMICOLON
declList: declList'
|
declList': varDecl COMMA declList'
| varDecl
functionDecl: FUN ID LPAREN declList RPAREN ARROW ty compoundStat
lambdaExp: LPAREN declList RPAREN ARROW ty compoundStat
declarations: varDecl SEMICOLON declarations
| functionDecl declarations
|
program: declarations
and this grammar is fine, but now I want to introduce parametric polymorphism and thus added the following productions to the grammar:
tyargs: LT typeList' GT
ty: ID tyargs
callExp: exp tyargs LPAREN expList RPAREN
idList: ID COMMA idList
| ID
tyvars: LT idList GT
functionDecl: FUN ID tyvars LPAREN declList RPAREN ARROW ty compoundStat
And now I'm getting the following 2 reduce/reduce conflicts, which I'm not sure how to solve
error: state 75: reduce/reduce conflict between rule 46 and rule 5 on GT
error: state 75: reduce/reduce conflict between rule 46 and rule 5 on COMMA
Can anyone tell me how I would remove these 2 conflicts?
EDIT: Here's the full .desc output from mlyacc http://pastebin.com/2w26ytuV. Not that this one also shows the 2 benign shift/reduce errors
The problem is that with the new rules, the grammar requires arbitrary lookahead to tell the difference between a
varDecl
and anexpStmt
. This comes fromLT
being both a binary operator for expressions, and indicating the start of atyargs
list for a parameterized type.One possible fix would be to introduce a new keyword to denote a parameterized type or function (like the
FUN
keyword currently used to introduce a function type), which would allow the parser to know in advance whether to treat aLT
as an operator or a type parameter list. So you would instead add new rules like:Another possibility is to use lexer feedback via the symbol table -- have the lexer recognize identifiers that require type parameters (by looking up the names in the symbol table) and return a different token for them.
A third possibility is to use a stronger parser generator that can deal with more lookahead, such as bison's
%glr-parser
option, or btyacc