I'm trying to learn about shift-reduce parsing. Suppose we have the following grammar, using recursive rules that enforce order of operations, inspired by the ANSI C Yacc grammar:
S: A;
P
: NUMBER
| '(' S ')'
;
M
: P
| M '*' P
| M '/' P
;
A
: M
| A '+' M
| A '-' M
;
And we want to parse 1+2 using shift-reduce parsing. First, the 1 is shifted as a NUMBER. My question is, is it then reduced to P, then M, then A, then finally S? How does it know where to stop?
Suppose it does reduce all the way to S, then shifts '+'. We'd now have a stack containing:
S '+'
If we shift '2', the reductions might be:
S '+' NUMBER
S '+' P
S '+' M
S '+' A
S '+' S
Now, on either side of the last line, S could be P, M, A, or NUMBER, and it would still be valid in the sense that any combination would be a correct representation of the text. How does the parser "know" to make it
A '+' M
So that it can reduce the whole expression to A, then S? In other words, how does it know to stop reducing before shifting the next token? Is this a key difficulty in LR parser generation?
Edit: An addition to the question follows.
Now suppose we parse 1+2*3
. Some shift/reduce operations are as follows:
Stack | Input | Operation
---------+-------+----------------------------------------------
| 1+2*3 |
NUMBER | +2*3 | Shift
A | +2*3 | Reduce (looking ahead, we know to stop at A)
A+ | 2*3 | Shift
A+NUMBER | *3 | Shift (looking ahead, we know to stop at M)
A+M | *3 | Reduce (looking ahead, we know to stop at M)
Is this correct (granted, it's not fully parsed yet)? Moreover, does lookahead by 1 symbol also tell us not to reduce A+M
to A
, as doing so would result in an inevitable syntax error after reading *3
?
The problem you're describing is an issue with creating
LR(0)
parsers - that is, bottom-up parsers that don't do any lookahead to symbols beyond the current one they are parsing. The grammar you've described doesn't appear to be anLR(0)
grammar, which is why you run into trouble when trying to parse it w/o lookahead. It does appear to beLR(1)
, however, so by looking 1 symbol ahead in the input you could easily determine whether to shift or reduce. In this case, anLR(1)
parser would look ahead when it had the1
on the stack, see that the next symbol is a+
, and realize that it shouldn't reduce pastA
(since that is the only thing it could reduce to that would still match a rule with+
in the second position).An interesting property of
LR
grammars is that for any grammar which isLR(k)
fork>1
, it is possible to construct anLR(1)
grammar which is equivalent. However, the same does not extend all the way down toLR(0)
- there are many grammars which cannot be converted toLR(0)
.See here for more details on
LR(k)
-ness:http://en.wikipedia.org/wiki/LR_parser