I have written a yacc file according to Tiger Book(appendix A, Tiger manual).
But there are still some shift/reduce conflicts. I do not know how to resolve these conflicts.
% yacc --version
bison (GNU Bison) 3.0.2
You can use this cmd to reproduce the problem:
% yacc -dvt tiger.y
tiger.y: warning: 37 shift/reduce conflicts [-Wconflicts-sr]
% cat tiger.y
:
%{
#include <stdio.h>
//#include "util.h"
//#include "errormsg.h"
int yylex(void); /* function prototype */
void yyerror(char *s)
{
EM_error(EM_tokPos, "%s", s);
}
%}
%union {
int pos;
int ival;
string sval;
}
%token <sval> ID STRING
%token <ival> INT
%token
COMMA COLON SEMICOLON LPAREN RPAREN LBRACK RBRACK
LBRACE RBRACE DOT
PLUS MINUS TIMES DIVIDE EQ NEQ LT LE GT GE
AND OR ASSIGN
ARRAY IF THEN ELSE WHILE FOR TO DO LET IN END OF
BREAK NIL
FUNCTION VAR TYPE
%right ASSIGN
%left OR
%left AND
%nonassoc EQ NEQ LT LE GT GE
%left PLUS MINUS
%left TIMES DIVIDE
%left UNARYMINUS
%precedence THEN
%precedence ELSE
%start program
%%
program: exp { }
;
exp:lvalue { }
|NIL { }
|LPAREN explist RPAREN { }
|LPAREN RPAREN {}
|INT {}
|STRING {}
|MINUS exp %prec UNARYMINUS {}
|ID LPAREN RPAREN {}
|ID LPAREN arglist RPAREN {}
|exp PLUS exp {}
|exp MINUS exp {}
|exp TIMES exp {}
|exp DIVIDE exp {}
|exp EQ exp {}
|exp NEQ exp {}
|exp LT exp {}
|exp LE exp {}
|exp GT exp {}
|exp GE exp {}
|exp AND exp {}
|exp OR exp {}
|ID LBRACE RBRACE {}
|ID LBRACE idlist RBRACE {}
|ID LBRACK exp RBRACK OF exp {}
|lvalue ASSIGN exp {}
|IF exp THEN exp ELSE exp {}
|IF exp THEN exp {}
|WHILE exp DO exp {}
|FOR ID ASSIGN exp TO exp DO exp {}
|BREAK {}
|LET decs IN END {}
|LET decs IN explist END {}
;
lvalue: ID {}
| lvalue DOT ID {}
| lvalue LBRACK exp RBRACK {}
;
explist: exp {}
| explist SEMICOLON exp {}
;
arglist:exp {}
|exp COMMA arglist {}
;
idlist:ID EQ exp {}
|ID EQ exp COMMA idlist {}
;
decs:dec {}
|decs dec {}
;
dec:tydec {}
|vardec {}
|fundec {}
;
tydec:TYPE ID EQ ty {}
;
ty:ID {}
|LBRACK tyfields RBRACK {}
|ARRAY OF ID {}
;
tyfields:/* NULL */
|notnulltyfields {}
;
notnulltyfields:ID COLON ID {}
|ID COLON ID COMMA notnulltyfields {}
;
vardec:VAR ID ASSIGN exp {}
|VAR ID COLON ID ASSIGN exp {}
;
fundec:FUNCTION ID LPAREN tyfields RPAREN EQ exp {}
|FUNCTION ID LPAREN tyfields RPAREN COLON ID EQ exp {}
;
The shift-reduce conflicts are easily discovered by looking at the
tiger.output
file produced with the-v
flag.Here's an example (I edited out the repetition):
We can see that State 88 occurs when a
WHILE
expression can be reduced (that's evident from the position of the.
in the state description:If the lookahead token at this point is a binary operator, the parser doesn't know whether to shift the operator, making the trailing
exp
in theWHILE
expression longer, or to immediately reduce theWHILE
. Obviously (to us, not tobison
), the solution is to shift.bison
doesn't know that because the productionexp: WHILE exp DO exp
doesn't have a precedence. The precedence of that production would be the precedence of its last terminal, which isDO
, so the simple solution is to define a precedence forDO
. Unsurprisingly, it should be the same as the precedence ofELSE
, as shown by the fact thatIF exp THEN exp ELSE exp .
does not produce a shift/reduce conflict.A similar problem occurs in states 112 and 129.
The shift/reduce conflict in state 1 is also clear from the
output
file:Here, the parser has just found an
ID
in a context in which anexp
might be reduced, and it faces two possibilities:shift. The
exp
isID [exp] OF exp
, so that in the end the result will be:reduce. The
exp
is the lvalueID[exp]
, using the following productions:In order to use the second alternative, the parser must immediately reduce
ID
tolvalue
, and therein lies the problem: the parser cannot know which of these two possibilities is correct until it sees theOF
following the matching ], but that's far in the future -- in fact, it could be an arbitrary number of tokens away.The solution here is to avoid forcing the parser to make a decision at this point. There are a couple of possibilities.
Since an expression can only be
ID [ exp ] OF
(and not anything more complicated), we can factorID
out of the conflict:Comparing the current state machine with the state machine after this change should make it clear how that works (and is a useful exercise in learning about bottom-up parsing).
If you don't want to go to all that work, you can simply add an "apparently redundant" production, as suggested by Appel in his textbook:
The added production to
lvalue
is obviously going to create a shift-reduce conflict; indeed, it is exactly the same shift-reduce conflict as in the original grammar. But this time, the conflict is between two different productions forlvalue
, and the default shift action is definitely the one you want to take in the case of a bareID
followed by a [. After the shift, both thelvalue
production and theexp
production are still available, so the parser doesn't have to make a decision until it finds the token after the ].The downside with this solution is that the parser generator will continue to report a shift-reduce conflict, since there clearly is one. Since shift-reduce conflicts are generally considered a sign that the grammar may be ambiguous, leaving a shift-reduce conflict in the code will be a long-term maintenance issue: after every grammar change, it will be necessary to verify that the shift-reduce conflict is benign.
Another solution, which also unfortunately leaves the warning in place, is to use bison's
%glr-parser
directive to generate a GLR parser. The GLR algorithm is capable of delaying reduction decisions by effectively maintaining two (or more) different possible parser stacks at the same time. For unambiguous grammars, this turns out to still be O(n) in the length of the input, but it is slightly slower. (Also, this option is not available in many other yacc derivatives.)Finally, you could get rid of
lvalue
by just adding its productions toexp
. You would then need to generalize thelvalue [ exp ]
to beexp [ exp ]
, which means the grammar would recognize a a superset of the original language: it will now accept certain inputs which are not valid. However, it is easy to check in the semantic actions for the relevant productions to see whether anexp
has the form of anlvalue
or not; if it is not, you can generate a syntax error in the semantic action.