Why this raises a warning about reduce/reduce conflict
root : set1 'X'
| set2 'X' 'X'
set1 : 'A'
| 'B'
set2 : 'B'
| 'C'
but the next is ok?
root : 'A' 'X'
| 'B' 'X'
| 'B' 'X' 'X'
| 'C' 'X' 'X'
The difference is because, in the first case, the parser has to make a choice about the reduction before it gets to see whether one
'X'
or two will follow.In the second, the parser can use the same state, let's call it
BX
, when it's seen aB
and anX
—both shifted—and then depending on the next token, can shift (if anX
), and then reduce the'B' 'X' 'X'
rule, or otherwise reduce'B' 'X'
immediately.Notice that if they didn't have identical tokens immediately following—e.g. you had
set1 'X'
butset2 'Y'
—then there would be no problem, because the lookahead would be able to kick in and pick which reduction to take.Here are the relevant sections from the output from
bison -v
for exposing this issue:Case one
Assuming we get a
'B'
, we go to state 2:Note that there are two possible reductions we can make: to
set1
orset2
, both with the same input token. Hence a reduce/reduce; we only have one token of lookahead, and with this grammar, the only token could be a'X'
—in either case!Case two
Assuming we get a
'B'
, we go to state 2:While we only have one token of lookahead, the parser generator can produce a state where it's seen
'B' 'X'
due to the accommodating structure of the input. Hence we go to state 6 in any event (or error otherwise ;-)):state 6
Here's where the magic happens: if we see an
'X'
, we shift and go to state 9 (where we reduce), otherwise we reduce'B' 'X'
immediately.For completeness, here's state 9:
If we had a token of lookahead to disambiguate
With this sample grammar:
Then, we start:
We shift
'B'
and go to state 2:So, this state is reached in both rules for
set1
andset2
, where we have a single'B'
token on the stack. In this case, if we next see a'Y'
, we reduce toset2
—or in any other case, reduce toset1
.The fact that it's picked
set1
as the "default" reduction could have implications for error handling.Appendix on GLR
Happy (and
bison
oryacc
) produce LALR(1) parsers by default, though you can produce a GLR parser with--glr
(or%glr-parser
in yourbison
declarations file). This can resolve ambiguities by concurrently trying both "possibilities"; the parser will fork and see how far it gets in either case.This is probably unwise unless you really need it, know you need it, and know what could happen if things go wrong, though. I'm not sure what happens if both forks terminate successfully; with my non-scientific testing, it seemed to always end up picking the longer parse.
Lexer hacks
If you don't want to use GLR, but don't want to restructure your parser significantly, you could consider using a lexer hack to overcome this issue.
Right now, you have this:
You could instead emit a token for a single
'X'
character, and a different token for two:This resolves ambiguity within a single token, and is an unambiguous LALR(1) grammar as a result.