Bison Shift/Reduce Conflict for a programming language grammar

345 views Asked by At

I'm writing a programming language parser and I'm stuck in this Shift/Reduce Conflict.

Here's the conflict state in the parser.output file obtained via running bison with -v

State 1

   24 ident: TIDENT .
   26 call: TIDENT . TLPAREN args TRPAREN

    TLPAREN  shift, and go to state 24

    TLPAREN   [reduce using rule 24 (ident)]
    $default  reduce using rule 24 (ident)

The conflict is happening when I'm trying to implement a rule for call, it seems to conflict with the normal ident rule.

Here's some parts of the grammar, (actions removed for simplicity but they shouldn't be needed. also I'm not sure if the order in which rules are defined matters, correct me if I'm wrong)

(capital letters are tokens)

The ident rule is simply

ident: TIDENT
          ;

Args, used by call.

args: /* empty */
        |
        expr
        |
        args TCOMMA expr
        ;

Call for calling a function

call:
       TIDENT TLPAREN args TRPAREN
       ;

Expr for expressions

expr:
    number
    |
    ternary
    |
    bool
    |
    string
    |
    ident
    |
    call
    |
    TLPAREN expr TRPAREN
    |
    expr TPLUS expr
    |
    expr TMINUS expr
    |
    expr TSLASH expr
    |
    expr TSTAR expr
    |
    expr TGT expr
    |
    expr TGE expr
    | 
    expr TLT expr
    |
    expr TLE expr
    ;

The question: why does the grammar have a shift/reduce conflict and how do you fix it? I've seen similar style parsers that do not have the conflicts its really weird.

If you need to see the full grammar for reproducing here's a hastebin https://hasteb.in/zozifopi.shell

If you need more details about anything else then please let me know in the comments and I'll edit the question accordingly.

3

There are 3 answers

1
rici On BEST ANSWER

The fundamental problem here is that your grammar is ambiguous because statements don't need to be terminated (stmts: stmts stmt) and a statement can be an expression. So two expressions can appear one after another without any punctuation. That means that f(3) could be one expression (a function call) or two expressions (f and (3)).

If you're happy for the parser to always interpret that as a function call, (which is its default behaviour, since it prefers to shift), then you could just add a couple of precedence declarations, so that the call has higher precedence than the reduction:

%precedence TIDENT
//...
%precedence TLPAREN
// ...
%%
expr : ident %prec TIDENT

That just papers over the ambiguity, and may cause surprising parses. But the only other solution is to make the language unambiguous.

6
John Bollinger On

The problem is that when the parser has shifted a TIDENT token and looks ahead to a TLPAREN token, the grammar permits two alternatives:

  1. reduce the TIDENT to an ident, or
  2. shift the TLPAREN.

Bison will normally resolve shift / reduce conflicts by choosing to shift, and if that's what you want in this case then you could simply ignore the warning.

In this particular case, however, you should be able to resolve the conflict by changing the rule for the call production:

call:
       ident TLPAREN args TRPAREN
       ;

With that rule, it is no longer an option to shift the TLPAREN without first reducing the TIDENT to an ident.

Alternatively, you could consider removing the ident non-terminal altogether, and instead using TIDENT directly wherever you now use ident. There may be other alternatives, too. Which works best for you may depend on what you're trying to do with your semantic actions. I can't comment any more specifically on that, as you have chosen to keep the semantic actions out of our consideration.

7
Jack On

Bison by default generates a LR parser, which is a bottom-up parser which can decide on each state if shifting a token or reducing it.

The conflict is really simple and well explained by the output itself (I wonder what's not clear), it's telling you:

If I find an IDENTIFIER should I reduce it through rule 24 to an ident non terminal or should I shift it as in call rule?

This because once reduced you can't shift and viceversa, which created indeed a conflict.

To solve the conflict you need to move that choice in the same state of the parser so that it's able to decide by the context.

Since ident has just a terminal IDENT rule and same applies for call you could easily move everything at same level to make it always shift:

expr: 
  IDENT | 
  IDENT LPAREN args RPAREN |
  ...

or use the same ident non terminal for both call and expr itself so it will always reduce it.