PEG Grammar Parsing, error when expression starts with negative number

333 views Asked by At

I have the following PEG grammar defined:

Program = _{ SOI ~ Expr ~ EOF }

Expr = { UnaryExpr | BinaryExpr }

Term = _{Int | "(" ~ Expr ~ ")" }

UnaryExpr = { Operator ~ Term }

BinaryExpr = { Term ~ (Operator ~ Term)* }

Operator = { "+" | "-" | "*" | "^" }

Int = @{ Operator? ~ ASCII_DIGIT+ }

WHITESPACE = _{ " " | "\t" }

EOF = _{ EOI | ";" }

And the following expressions are all parsed correctly:

1 + 2   
1 - 2    
1 + -2   
1 - -2   
1        
+1       
-1

But any expression that begins with a negative number errors

-1 + 2

errors with

  --> 1:4
  |
1 | -1 + 2
  |    ^---
  |
  = expected EOI

What I expect (what I would like) is for -1 + 2 to be treated the same as 1 + -2, that is a Binary expression that is made up of two Unary Expressions.

I have toyed around with a lot of variations with no success. And, I'm open to using an entirely different paradigm if I need to, but I'd really like to keep the UnaryExpression idea since I've already built my parser around it.

I'm new to PEG, so I'd appreciate any help.

For what its worth, I'm using Rust v1.59 and https://pest.rs/ to both parse and test my expressions.

1

There are 1 answers

0
Akida On BEST ANSWER

You have a small error in the Expr logic. The first part before the | takes precedence if both match. And -1 is a valid UnaryExpr so the program as a whole is expected to match SOI ~ UnaryExpr ~ EOF in this case. But there is additional data (+ 2) which leads to this error.

If you reverse the possibilities of Expr so that Expr = { BinaryExpr | UnaryExpr } the example works. The reason for that is that first BinaryExpr will be checked and only if that fails UnaryExpr.