Building a boolean logic parser in Scala with PackratParser

133 views Asked by At

I am trying to build a Boolean logic parser e.g. A == B AND C == D to output something like And(Equals(A,B), Equals(C,D))

My parser has the following definitions:

def program: Parser[Operator] = {
    phrase(operator)
}
def operator: PackratParser[Operator] = {
    leaf | node
}
def node: PackratParser[Operator] = {
    and | or 
}
def leaf: PackratParser[Operator] = {
    equal | greater | less
}
def and: PackratParser[Operator] = {
    (operator ~ ANDT() ~ operator) ^^ {
      case left ~ _ ~ right => And(left, right)}
}

I would expect the parser to map to program -> operator -> node -> and -> operator (left) -> leaf -> equal -> operator (right) -> leaf -> equal. This doesn't work. However if in the above code I do the changes

def operatorWithParens: PackratParser[Operator] = {
    lparen ~> (operator | operatorWithParens) <~ rparen
}

and change and to be

def and: PackratParser[Operator] = {
    (operatorWithParens ~ ANDT() ~ operatorWithParens) ^^ {
      case left ~ _ ~ right => And(left, right)}
}

Parsing (A == B) AND (C == D) succeeds.

I can not wrap my head around why the former doesn't work while the later does. How should I change my code to be able to parse A == B AND C == D?

EDIT: Following @Andrey Tyukin advice I've modified the gramma to account for precedence

def program: Parser[Operator] = positioned {
    phrase(expr)
}
def expr: PackratParser[Operator] = positioned {
    (expr ~ ORT() ~ expr1) ^^ {
      case left ~ _ ~ right => Or(left, right)} | expr1
}
def expr1: PackratParser[Operator] = positioned {
    (expr1 ~ ANDT() ~ expr2) ^^ {
      case left ~ _ ~ right => And(left, right)} | expr2
}
def expr2: PackratParser[Operator] = positioned {
    (NOTT() ~ expr2) ^^ {case _ ~ opr => Not(opr)} | expr3
}
def expr3: PackratParser[Operator] = {
    lparen ~> (expr) <~ rparen | leaf
}

And although PackratParser supports left-recursive grammar, I run into an infinite loop that never leaves expr

1

There are 1 answers

7
Andrey Tyukin On

It looks like there is a path from operator to a shorter operator:

operator -> node -> and -> (operator ~ somethingElse)

You seem to be assuming that the shorter operator (left) will somehow reduce to leaf, whereas the outermost operator would skip the leaf and pick the node, for whatever reason. What it does instead is just chocking on the first leaf it encounters.

You could try to move the node before the leaf, so that the whole operator doesn't choke on the first A when seeing sth. like A == B AND ....

Otherwise, I'd suggest to refactor it into

  • disjunctions
  • of conjunctions
  • of atomic formulas

where atomic formulas are either

  • comparisons or
  • indivisible parenthesized top-level elements (i.e. parenthesized disjunctions, in this case).

Expect to use quite a few repSeps.