Generating a parser for expressions

Question

Generating a parser for expressions

407 views Asked by Allan Simonsen At 20 August 2021 at 04:48

I am trying to generate a javascript parser for the ebnf grammar described in this Microsoft article. The ebnf specified in the article does not work when I use it as its written, so I have tried to simplify it to make it work with the REx parser generator.

The goal is in Javascript to be able to parse and evaluate expressions like these to True or False:

AttributeA > 2 - The value of AttributeA is greater than 2
HasCategory(Assembly) - The node has Category Assembly
Assembly - The node has Category Assembly
HasValue(AttributeA) - The attribute AttributeA has a value. Its not undefined.
AttributeA < AttributeB - The value of attribute AttributeA is less than the value of attribute Attribute B
IsReference - The value of the attribute IsReference is True
AttributeA + 2 > 5 and AttributeB - 5 != 7
AttributeA * 1.25 >= 500

I am using the REx parser generator online here: https://www.bottlecaps.de/rex/. If you have suggestions for other parser generators that produce JavaScript I would appreciate some links to where I can find them.

The issue I'm struggling with is the definition of the MethodCall. I have tried a lot of different definitions but all fail. When I remove the MethodCall and MethodArgs definition the REx parser generator produces a parser.

So I would appreciate any help to crack this problem a lot.

Below is the grammar as far as I have been able to get.

Expression
    ::= BinaryExpression | MethodCall | "(" Expression ")" | Number
BinaryExpression
    ::= RelationalExpression ( ( '=' | '!=' ) RelationalExpression )*
RelationalExpression
    ::= AdditiveExpression ( ( '<' | '>' | '<=' | '>=' | 'and' | 'or' ) AdditiveExpression )*
AdditiveExpression
    ::= MultiplicativeExpression ( ( '+' | '-' ) MultiplicativeExpression )*
MultiplicativeExpression
    ::= UnaryExpression ( ( '*' | '/' | '%' ) UnaryExpression )*
UnaryExpression
    ::= "!" Identifier | "+" Identifier | "-" Identifier | Identifier

MethodCall
    ::= Identifier "(" MethodArgs* ")"
MethodArgs
    ::= Expression | Expression "," MethodArgs

Identifier
    ::= Letter ( Letter | Digit | "_" )*
Number
    ::= Digit ('.' Digit) | (Digit)*

<?TOKENS?>

Letter
    ::= [A-Za-z]
Digit
    ::= [0-9]

Here are some of the different versions for the MethodCall definition I have tried but all of them fail.

MethodCall
    ::= Identifier "(" MethodArgs? ")"
MethodArgs
    ::= Expression ("," MethodArgs)*

MethodCall
    ::= Identifier "(" MethodArgs ")"
MethodArgs
    ::= ( Expression ("," MethodArgs)* )?

MethodCall
    ::= MethodName "(" MethodArgs? ")"
MethodName
    ::= Identifier
MethodArgs
    ::= Expression ("," MethodArgs)*

MethodCall
    ::= Identifier "(" MethodArgs ")"
MethodArgs
    ::= Expression ("," MethodArgs)* | ""

MethodCall
    ::= Identifier MethodArgs
MethodArgs
    ::= "(" (Expression ("," MethodArgs)* | "")  ")"

MethodCall
    ::= Identifier "(" MethodArgs ")"
MethodArgs
    ::= Expression | Expression "," MethodArgs | ""

MethodCall
    ::= Identifier "(" Expression ")"

I have tried to get inspiration from a number of other languages to see how they do it, but with no luck yet so far:

Update

Just wanted to let you know how this turned out. I struggled to get the EBNF grammar to do what I needed it to do, so I took a look at Nearley like @rici suggested and converted my grammar into the Nearley grammar syntax and got the tooling to work. It was just a much better choice for my project. The documentation is very good, the tooling is also great and the error messages is very helpful. So a huge thanks to @rici for suggesting Nearley.

Below is the grammar I have implemented. I have tested with the following inputs:

'2 + 4', '2 + 4 - 6', '(2 + 4)', '!true', '!(true)', 'hasCategory(test)', 'hasCategory(test,test2)', 'hasCategory( test , test2 )', 'hasCategory(test,test2, test3)', 'IsReference', 'IsReference()', '2 * 4', '(2 / 4)', '2 * 4 + 2', '(2 / 4) + 2', '2 > 4', '2 >= 2', '2 = 4', '2 == 2', '2 != 4', '2 !== 2', '(2 * 4 + 2) > 4', '(2 * 4 + 2) > (4 + 10)', 'true', 'true or false', 'true || false', 'true and false', 'true && false', '(true or false)', '!(true or false)', '2 != 1+1', '2 != (1+1)', '2 != (1+2)', '(2 > 2 or (2 != 1+1))',

@builtin "whitespace.ne" # `_` means arbitrary amount of whitespace
@builtin "number.ne"     # `int`, `decimal`, and `percentage` number primitives
@builtin "postprocessors.ne"

@{%
function methodCall(nameId, argsId = -1) {
  return function(data) {
      return {
          type: 'methodCall',
          name: data[nameId],
          args: argsId == -1 ? [] : data[argsId]
      };
    }
}

function value() {
  return function(data) {
      return {
          type: 'value',
          value: data[0]
      };
    }
}
%}
expression -> 
    methodCall {% id %}
  | relationalExpression {% value() %}
  | booleanExpression  {% value() %}
  | _ identifier _  {% methodCall(1) %}

booleanExpression ->
      parentheses {% id %}
    | parentheses _ "and"i _ parentheses {% d => d[0] && d[4] %}
    | parentheses _ "&&" _ parentheses {% d => d[0] && d[4] %}
    | parentheses _ "or"i _ parentheses {% d => d[0] || d[4] %}
    | parentheses _ "||" _ parentheses {% d => d[0] || d[4] %}

parentheses ->
    _ "(" relationalExpression ")" _ {% nth(2) %}
  |  _ "(" booleanExpression ")" _ {% nth(2) %}
  | unaryExpression {% id %}

relationalExpression -> 
      _ additiveExpression _ {% nth(1) %}
    | relationalExpression _ "=" _ additiveExpression {% d => d[0] == d[4] %}
    | relationalExpression _ "==" _ additiveExpression {% d => d[0] == d[4] %}
    | relationalExpression _ "!=" _ additiveExpression {% d => d[0] != d[4] %}
    | relationalExpression _ "!==" _ additiveExpression {% d => d[0] != d[4] %}
    | relationalExpression _ "<" _ additiveExpression {% d => d[0] < d[4] %}
    | relationalExpression _ ">" _ additiveExpression {% d => d[0] > d[4] %}
    | relationalExpression _ "<=" _ additiveExpression {% d => d[0] <= d[4] %}
    | relationalExpression _ ">=" _ additiveExpression {% d => d[0] >= d[4] %}

additiveExpression -> 
    _ multiplicativeExpression _ {% nth(1) %}
  | additiveExpression _ "+" _ multiplicativeExpression {% d => d[0] + d[4] %}
  | additiveExpression _ "-" _ multiplicativeExpression {% d => d[0] - d[4] %}

multiplicativeExpression ->
    _ parentheses _  {% nth(1) %}
  | parentheses _ "*" _ parentheses {% d => d[0] * d[4] %}
  | parentheses _ "/" _ parentheses {% d => d[0] / d[4] %}
  | parentheses _ "%" _ parentheses {% d => d[0] % d[4] %}

unaryExpression ->  
    _ "!" _ expression _ {% d => !d[3] %}
  | _ decimal _ {% nth(1) %}
  | _ unsigned_int _ {% nth(1) %}
  | _ boolean _ {% nth(1) %}
  | _ identifier _ {% nth(1) %}

methodCall -> 
      identifier "(" methodArgs ")" {% methodCall(0, 2) %}
    | identifier "(" _ ")" {% methodCall(0) %}

methodArgs ->
    _ identifier _  {% d => [d[1]] %}
  | _ identifier _ "," _ methodArgs _ {% d => [d[1]].concat(d[5]) %}

boolean ->
    "true"i {% () => true %} 
  | "false"i {% () => false %}

identifier -> 
  [A-Za-z0-9_]:+ {% (data, l, reject) => {
    var ident = data[0].join('');
    if (ident.toLowerCase() === 'true' || ident.toLowerCase() === 'false') {
      return reject;
    } else {
      return ident;
    }
  }
   %}

Original Q&A

There are 1 answers

**kaby76** · Accepted Answer · 2021-08-20T11:56:19+00:00

There are a few problems with your grammar, but it's mostly fine.

'and' and 'or' conflict with Identifier. So, subtract those string literals from Identifier in its rule.
Number was missing parentheses. It should be Number ::= Digit ( ('.' Digit) | (Digit)* )
You are missing the EOF rule. Almost every parser generator I know requires a bottom/EOF rule to force consumption of the entire input. I added the rule "Input".
Make sure to click the "configure" box, then "backtracking". Your grammar is ambiguous, which is fine, but requires you to tell the parser generator to handle that.

Parser generators have a slightly different syntax for "EBNF", which is what REx takes. REx adds a <?TOKENS?> string to denote the boundary between parser and lexer rules. Microsoft says the grammar is "BNF" but it's not because it uses the Kleene operator <Identifier> ::= [^. ]*, an EBNF construct. It also fudges the definition of <Literal> and <Number> with prose.

I haven't tested the generated parser, but it seems like a straightforward recursive descent implementation. The parser generators that I am familiar with, and that are popular, are listed in the conversion page. (I'm writing converters for all of them and many more.)

Try this:

Input ::= Expression EOF

Expression
    ::= BinaryExpression | MethodCall | "(" Expression ")" | Number

BinaryExpression
    ::= RelationalExpression ( ( '=' | '!=' ) RelationalExpression )*
RelationalExpression
    ::= AdditiveExpression ( ( '<' | '>' | '<=' | '>=' | 'and' | 'or' ) AdditiveExpression )*
AdditiveExpression
    ::= MultiplicativeExpression ( ( '+' | '-' ) MultiplicativeExpression )*
MultiplicativeExpression
    ::= UnaryExpression ( ( '*' | '/' | '%' ) UnaryExpression )*
UnaryExpression
    ::= "!" Identifier | "+" Identifier | "-" Identifier | Identifier

MethodCall
    ::= Identifier "(" MethodArgs* ")"
MethodArgs
    ::= Expression | Expression "," MethodArgs

<?TOKENS?>

Identifier
    ::= ( ( Letter ( Letter | Digit | "_" )* ) - ( 'and' | 'or' ) )
Number ::= Digit ( ('.' Digit) | (Digit)* )

Letter
    ::= [A-Za-z]
Digit
    ::= [0-9]

EOF      ::= $

TechQA.

Generating a parser for expressions

There are 1 answers

Related Questions in PARSING

Related Questions in GRAMMAR

Related Questions in EBNF

Related Questions in NEARLEY

Popular Questions

Popular Tags

Trending Questions