in Bison, how can I specity left associativity for a non-terminal?

2k views Asked by At

I have the following Bison grammar snippet:

binary_op:         BINARY_OP
                    {
                        ...
                    }
                    | '|' %prec BINARY_OP
                    {
                        ...
                    }
;

non_keyword_expr:   non_keyword_expr binary_op non_keyword_expr %prec BINARY_SEND_PREC %dprec 2
                    {
                        ...
                    }
;

| has overloaded meaning in my grammar, so I can't just return it as token BINARY_OP from my lexer. It could be a different token depending on context.

If I use this as my input:

4 OR 5 OR 6

I can parse it successfully (OR is recognized as a BINARY_OP token by the lexer).

If, however, my input is this:

4 | 5 | 6

I get an ambiguous grammar error. (the | isn't being recognized as left associative)

How can I get binary_op to be left-associative within non_keyword_expr? the %prec statement on the second rule for binary_op seems to have no effect.

edit: this is for a GLR parser

2

There are 2 answers

11
rici On BEST ANSWER

Simple answer: You cannot. Precedence (and associativity) only apply to productions (on the left) and terminals (on the right). They do not apply to non-terminals.

That's not an arbitrary decision; it's inherent to the way bison handles resolution of shift/reduce conflicts. At every parsing step, the lookahead token (a terminal) must either eventually be shifted, but it is possible that there is a production which could be reduced before the terminal is shifted. If the reduction is not performed immediately, it will never be performed. An LR(1) grammar allows the parser to decide based on the current parse stack and the lookahead token whether a reduction should be performed or whether the lookahead token should be immediately shifted. If both actions are possible, then the grammar is said to have a shift/reduce conflict, and it is not strictly speaking LR(1).

Precedence and associativity rules are used to resolve shift/reduce conflicts. Productions may have an implicit or explicit precedence: the explicit precedence is provided by the %prec declaration; otherwise the precedence of the last terminal in the production is used. In the event of a shift/reduce conflict, the precedence of the production which could be reduced is compared to the precedence of the lookahead terminal which could be shifted, and whichever has the greater precedence wins. That's it. The precedence is not retained or inherited. In fact, it is inaccurate to say that the precedences are compared, since that doesn't happen during the parse; the parser has an action or transition table which defines what to do in the case of a particular stack configuration ("state") and lookahead token, and the precedence information is used at parser-generation time to fill in the entries in the action table which would otherwise be ambiguous.

In the case of your production

binary_op: '|' %prec BINARY_OP

the %prec declaration is useless, because binary_op must always be reduced immediately; it cannot participate in a shift/reduce conflict. The shift/reduce conflict comes with the non_keyword_expression production, which is marked with a (different) %prec declaration, and that is the declaration which will be used for that production.

The production for non_keyword_expression does not have a terminal, so it has no implicit precedence either. That's generally not what you want, and the use of productions like:

binary_op: '|' | "OR" ;

are not really compatible with the use of precedence to resolve parsing conflicts.


Note 1: This is not completely true if you ask for a GLR parser. The GLR parser can perform both the shift and the reduce, because it (effectively) maintains a number of parser states at the same time. Eventually, all but one of these states must be eliminated; otherwise, the parse is ambiguous. GLR parsers use precedence (and %prec declarations) in precisely the same way that non-GLR parsers do; a parser action eliminated by precedence is really eliminated and does not lead to parallel states. However, a GLR parser can also deal with reduce/reduce conflicts, in which there are two possible reductions (possibly to the same non-terminal). These conflicts can be resolved using %dprec ("dynamic precedence") declarations.

0
user3125280 On

Bison's precedence for rules works by comparing the rule's precedence to the precedence of all conflicting tokens to shift, when resolving an s/r conflict. So it compares BINARY_SEND_PREC to the precedence of '|' and 'OR'. For 'OR' it picks the reduce. To get the reduce for '|' as well the token '|' itself would need to be %left '|'. for them to work together '|' and 'OR' need the same precedence.

There is a workaround for this kind of problem if you can specify the associativty of terminals 'OR' and '|', etc and set their precedence the same. With a couple changes the infix calculator example can parse inputs like this:

2 PLUS -3 TIMES 4 ^ 2 + 3

-43

The major changes look like this:

%token PLUS
%token TAKE
%left '-' '+' PLUS TAKE

... 

add:      '+' | PLUS;
exp:      NUM                           { $$ = $1;         }
        | exp add exp        %prec '+'  { $$ = $1 + $3;    }

Precedence on non-terminals would be a useful extension to bison IMHO. It would allow the user to fix s/r conflicts by favouring the reduction when the a prefix of the non-terminal could be shifted (and only when it could be shifted for a non-terminal with precedence - there may be other valid reasons to shift). In fact I found this question after trying to implement a haskell style function application syntax ie

 x y z -> ((x y) z)

but because a single terminal alone is also valid here, reducing an x/y/z to the non-terminal is valid. Therfore bison would reach non-term_x non-term_y | z (| is the stack/lookahead boundary) and not know whether to reduce to non-term_x_y or shift z. (A similar trick works here fortunately)

I dug around in the bison source a little, but I can't see a simple way to allow %prec on non-terminals. When s/r conficts are resolved, only the reducing rule is known and the conflicting token to shift, the precedences of which are compared. You would need to know all valid shifiting reasons here, and there are ways to access the confliciting shift rules, so maybe.. you would need to divide the shiftable tokens into groups corresponding to the rules they would eventually reduce by, and then compare rules' precedences. Somethin I'll look at someday...