Arithmetic expression grammar in prefix notation (Java Cup)

1.9k views Asked by At

I'm writting a grammar for arithmetic expression in prefix notation. However I have an issue when parsing negative numbers or substraction. Grammar example is this:

precedence right +, -;
precedence right *, /;
precedence right uminus;

E ::= + E E
   |  - E E
   |  * E E
   |  / E E
   |  ( E )
   |  - E %prec uminus
   |  id
   |  digit
   ;

But if my input is - 5 4, it reduces 5 as E, next it reduces - E (negative) and then parser gives me a syntax error at 4. The correct one should be 5 as E, next 4 as E and then - E E as E. How can I solve this problem using associativity? or do I need to rewrite my grammar?

2

There are 2 answers

4
rici On BEST ANSWER

(Promoted from comment)

Your grammar really is ambiguous, and precedence declarations won't help you a bit.

Consider the input the input consisting of N - tokens, followed by M 1 tokens.

- - - - - - - ... - 1 1 1 ... 1

In order for this to be an expression, M-1 of the - tokens must be binary, and the remaining N-(M-1) unary, but there is no way to tell which is which (unless they are all binary).

Even if you arbitrarily say that the first N-(M-1) -s are unary, you can't tell what the value of N-(M-1) is until you read the entire input, which means you can't parse with a finite lookahead.

But the whole point of prefix notation is to avoid the need for parentheses. Arbitrary declarations like the above make it impossible to represent alternative interpretations, so that some expressions would be impossible to represent in prefix notation. That's just plain wrong.

Here's a simple case:

- 5 - - - 4 3 1

is either

5 - (- (4 - (3 - 1)))
5 - ((- (4 - 3)) - 1)
5 - (((- 4) - 3) - 1)

In prefix notation, you need to declare the "arity" of every operator, either implicitly (every operator has a known number of arguments), or explicitly using a notation like this, borrowed from Prolog:

-/2 5 -/2 -/2 -/1 4 3 1

Alternatively, you can delimit the arguments with mandatory parentheses, as with Lisp/Scheme "s-exprs":

(- 5 (- (- (- 4) 3) 1))
3
Mario Rossi On

In first place, remove all precedence declarations. They are not needed in prefix grammars. In fact, that should be enough to solve the issue in any parser generator. Which one are you using, BTW?

Cup has a finite lookahead. As @rici points out, the ambiguity can't be resolved in this case. What you can do is to restrict the grammar so just one consecutive unary - can be used.

    B ::= E
       |  - E
       ;
    E ::= + B B
       |  - B B
       |  * B B
       |  / B B
       |  ( B )
       |  id
       |  digit
       ;

Please check the above several times as I'm pretty rusty.