Is Lemon correctly handling nonassoc precedence?

157 views Asked by At

I feel like the Lemon parser generator is doing it wrong with nonassoc precedence. I have a simplified grammar that exhibits the problems I'm seeing.

%nonassoc EQ.
%left PLUS.

stmt ::= expr.

expr ::= expr EQ expr.
expr ::= expr PLUS expr.
expr ::= IDENTIFIER.

Yields a report with a conflict like so:

State 4:
      expr ::= expr * EQ expr
  (1) expr ::= expr EQ expr *
      expr ::= expr * PLUS expr

                        EQ shift  2
                        EQ reduce 1   ** Parsing conflict **
                      PLUS shift  1
                 {default} reduce 1

If I tell it that equals is left associative, the problem goes away. It's as if nonassoc doesn't put the rule into the precedence set. Comparing to a Bison version of that grammar, there is no conflict. And assignment really should be nonassociative. I'd rather not lie to it about that to work around this.

1

There are 1 answers

0
acbarrentine On

After spending some time poring over the reports generated by both Lemon and Bison for the associated grammars, I can only conclude that Lemon is, indeed, mishandling nonassoc precedence. The smoking gun is contained in that state 4 quoted above, but I should probably lay out some more detail for clarity.

The states the build up to expr EQ are straightforward. You arrive at state 2 then:

State 2:
      expr ::= * expr EQ expr
      expr ::= expr EQ * expr
      expr ::= * expr PLUS expr
      expr ::= * IDENTIFIER

                IDENTIFIER shift  5
                      expr shift  4

This state contains the current expr EQ item, which expects to be followed by another expr. Because of that, it contains the First set for expr, which are the 3 entries starting with * in the state. If we read an expr in this state, we'll land in state 4 with an item either partway through the reduction or at the end.

expr ::= expr * EQ expr
expr ::= expr EQ expr *

What happens if we read an EQ in this state? I told Lemon the answer. It's an error because EQ is nonassociative. Instead it reports a shift/reduce conflict. In practice, it will shift, which will let it accept an illegal parse, such as x=y=z.

Bison contains these same states, numbered differently, but with a telling distinction.

state 8

    2 expr: expr . EQ expr  [$end, PLUS]
    2     | expr EQ expr .  [$end, PLUS]
    3     | expr . PLUS expr

    EQ  error (nonassociative)

    $default  reduce using rule 2 (expr)

    Conflict between rule 2 and token PLUS resolved as reduce (PLUS < EQ).
    Conflict between rule 2 and token EQ resolved as an error (%nonassoc EQ).

Bison knows what nonassociative means, and uses that to eliminate the supposed ambiguity if it sees a second EQ in an expression.