Bison shift/reduce conflict - tiger compiler

5.5k views Asked by At

I have written a yacc file according to Tiger Book(appendix A, Tiger manual).

But there are still some shift/reduce conflicts. I do not know how to resolve these conflicts.

% yacc --version
bison (GNU Bison) 3.0.2

You can use this cmd to reproduce the problem:

% yacc -dvt tiger.y
tiger.y: warning: 37 shift/reduce conflicts [-Wconflicts-sr]

% cat tiger.y:

%{
#include <stdio.h>
//#include "util.h"
//#include "errormsg.h"

int yylex(void); /* function prototype */

void yyerror(char *s)
{
    EM_error(EM_tokPos, "%s", s);
}
%}


%union {
    int pos;
    int ival;
    string sval;
}


%token <sval> ID STRING
%token <ival> INT

%token
  COMMA COLON SEMICOLON LPAREN RPAREN LBRACK RBRACK
  LBRACE RBRACE DOT
  PLUS MINUS TIMES DIVIDE EQ NEQ LT LE GT GE
  AND OR ASSIGN
  ARRAY IF THEN ELSE WHILE FOR TO DO LET IN END OF
  BREAK NIL
  FUNCTION VAR TYPE


%right ASSIGN
%left OR
%left AND
%nonassoc EQ NEQ LT LE GT GE
%left  PLUS MINUS
%left  TIMES DIVIDE
%left  UNARYMINUS

%precedence THEN
%precedence ELSE


%start program

%%

program:    exp {  }
       ;

exp:lvalue {  }
   |NIL    {  }
   |LPAREN explist RPAREN {  }
   |LPAREN         RPAREN {}
   |INT {}
   |STRING {}
   |MINUS exp %prec UNARYMINUS {}
   |ID LPAREN RPAREN {}
   |ID LPAREN arglist RPAREN {}
   |exp PLUS exp {}
   |exp MINUS exp {}
   |exp TIMES exp {}
   |exp DIVIDE exp {}
   |exp EQ exp {}
   |exp NEQ exp {}
   |exp LT exp {}
   |exp LE exp {}
   |exp GT exp {}
   |exp GE exp {}
   |exp AND exp {}
   |exp OR exp {}
   |ID LBRACE RBRACE {}
   |ID LBRACE idlist RBRACE {}
   |ID LBRACK exp RBRACK OF exp {}
   |lvalue ASSIGN exp {}
   |IF exp THEN exp ELSE exp {}
   |IF exp THEN exp {}
   |WHILE exp DO exp {}
   |FOR ID ASSIGN exp TO exp DO exp {}
   |BREAK {}
   |LET decs IN END {}
   |LET decs IN explist END {}
   ;

lvalue: ID {}
      | lvalue DOT ID {}
      | lvalue LBRACK exp RBRACK {}
      ;

explist: exp {}
       | explist SEMICOLON exp {}
       ;

arglist:exp {}
       |exp COMMA arglist {}
       ;

idlist:ID EQ exp {}
      |ID EQ exp COMMA idlist {}
      ;

decs:dec {}
       |decs dec {}
       ;

dec:tydec {}
   |vardec {}
   |fundec {}
   ;

tydec:TYPE ID EQ ty {}
     ;

ty:ID {}
  |LBRACK tyfields RBRACK {}
  |ARRAY OF ID {}
  ;

tyfields:/* NULL */
        |notnulltyfields {}
        ;

notnulltyfields:ID COLON ID {}
               |ID COLON ID COMMA notnulltyfields {}
               ;

vardec:VAR ID ASSIGN exp {}
      |VAR ID COLON ID ASSIGN exp {}
      ;

fundec:FUNCTION ID LPAREN tyfields RPAREN EQ exp {}
      |FUNCTION ID LPAREN tyfields RPAREN COLON ID EQ exp {}
      ;
1

There are 1 answers

5
rici On BEST ANSWER

The shift-reduce conflicts are easily discovered by looking at the tiger.output file produced with the -v flag.

Here's an example (I edited out the repetition):

State 88

   11 exp: exp . PLUS exp
   12    | exp . MINUS exp
# ...
   29    | WHILE exp DO exp .

    PLUS    shift, and go to state 34
    MINUS   shift, and go to state 35
# ...

    PLUS      [reduce using rule 29 (exp)]
    MINUS     [reduce using rule 29 (exp)]
# ...

    $default  reduce using rule 29 (exp)

We can see that State 88 occurs when a WHILE expression can be reduced (that's evident from the position of the . in the state description:

   29    | WHILE exp DO exp .

If the lookahead token at this point is a binary operator, the parser doesn't know whether to shift the operator, making the trailing exp in the WHILE expression longer, or to immediately reduce the WHILE. Obviously (to us, not to bison), the solution is to shift. bison doesn't know that because the production exp: WHILE exp DO exp doesn't have a precedence. The precedence of that production would be the precedence of its last terminal, which is DO, so the simple solution is to define a precedence for DO. Unsurprisingly, it should be the same as the precedence of ELSE, as shown by the fact that IF exp THEN exp ELSE exp . does not produce a shift/reduce conflict.

A similar problem occurs in states 112 and 129.

The shift/reduce conflict in state 1 is also clear from the output file:

State 1

    9 exp: ID . LPAREN RPAREN
   10    | ID . LPAREN arglist RPAREN
   23    | ID . LBRACE RBRACE
   24    | ID . LBRACE idlist RBRACE
   25    | ID . LBRACK exp RBRACK OF exp
   34 lvalue: ID .

    LPAREN  shift, and go to state 15
    LBRACK  shift, and go to state 16
    LBRACE  shift, and go to state 17

    LBRACK    [reduce using rule 34 (lvalue)]
    $default  reduce using rule 34 (lvalue)

Here, the parser has just found an ID in a context in which an exp might be reduced, and it faces two possibilities:

  1. shift. The exp is ID [exp] OF exp, so that in the end the result will be:

    ID '[' exp ']' OF exp        --> exp    (rule 25)
    
  2. reduce. The exp is the lvalue ID[exp], using the following productions:

    ID                           --> lvalue (rule 34)
    lvalue '[' exp ']'           --> lvalue (rule 36)
    lvalue                       --> exp    (rule 2)
    

In order to use the second alternative, the parser must immediately reduce ID to lvalue, and therein lies the problem: the parser cannot know which of these two possibilities is correct until it sees the OF following the matching ], but that's far in the future -- in fact, it could be an arbitrary number of tokens away.

The solution here is to avoid forcing the parser to make a decision at this point. There are a couple of possibilities.

  1. Since an expression can only be ID [ exp ] OF (and not anything more complicated), we can factor ID out of the conflict:

    exp   : ID
          | lvalue_not_id
          | ...
    
    lvalue: ID
          | lvalue_not_id
    
    lvalue_not_ID
          : lvalue DOT ID
          | ID            LBRACK exp RBRACK
          | lvalue_not_ID LBRACK exp RBRACK
    

    Comparing the current state machine with the state machine after this change should make it clear how that works (and is a useful exercise in learning about bottom-up parsing).

  2. If you don't want to go to all that work, you can simply add an "apparently redundant" production, as suggested by Appel in his textbook:

    lvalue: ID 
          | lvalue DOT ID 
          | lvalue LBRACK exp RBRACK
          | ID LBRACK exp RBRACK
    

    The added production to lvalue is obviously going to create a shift-reduce conflict; indeed, it is exactly the same shift-reduce conflict as in the original grammar. But this time, the conflict is between two different productions for lvalue, and the default shift action is definitely the one you want to take in the case of a bare ID followed by a [. After the shift, both the lvalue production and the exp production are still available, so the parser doesn't have to make a decision until it finds the token after the ].

    The downside with this solution is that the parser generator will continue to report a shift-reduce conflict, since there clearly is one. Since shift-reduce conflicts are generally considered a sign that the grammar may be ambiguous, leaving a shift-reduce conflict in the code will be a long-term maintenance issue: after every grammar change, it will be necessary to verify that the shift-reduce conflict is benign.

  3. Another solution, which also unfortunately leaves the warning in place, is to use bison's %glr-parser directive to generate a GLR parser. The GLR algorithm is capable of delaying reduction decisions by effectively maintaining two (or more) different possible parser stacks at the same time. For unambiguous grammars, this turns out to still be O(n) in the length of the input, but it is slightly slower. (Also, this option is not available in many other yacc derivatives.)

  4. Finally, you could get rid of lvalue by just adding its productions to exp. You would then need to generalize the lvalue [ exp ] to be exp [ exp ], which means the grammar would recognize a a superset of the original language: it will now accept certain inputs which are not valid. However, it is easy to check in the semantic actions for the relevant productions to see whether an exp has the form of an lvalue or not; if it is not, you can generate a syntax error in the semantic action.