Adding rule to Treesitter LR1 grammar changes precedence

135 views Asked by At

I am trying to get operator precedence correct in a Treesitter grammar. Treesitter is a LR1 parser generator.

I have a straightforward artithmetic grammar, which partially looks like this:

multiply_expression: $ => prec.left(2, seq(
    $._expression,
    '*',
    $._expression,
)),

addition_expression: $ => prec.left(1, seq(
    $._expression,
    '+',
    $._expression,
)),

This works correctly. multiply_expression indeed gets a higher precedence than addition_expression.

However, the precedence changes when I add an intermediate rule:

_partial_multi: $ => seq(
    $._expression,
    '*',
),

multiply_expression: $ => prec.left(2, seq(
    $._partial_multi,
    $._expression,
)),

I moved $.expression, '*' to its own rule. To me, this seems to be an equivalent grammar, and I expect no changes. However, with this change the precedence is no longer correct. addition_expression, which remained unchanged, seems to have a higher precedence than multiply_expression.

Why does introducing an extra step change the precedence? Is there a name for this problem, or where can I find more information about it? When writing a grammar or fixing precedence problems, are there rules to follow or ways to think about this?

1

There are 1 answers

0
ahelwer On

Here is your full grammar, for reproducibility:

module.exports = grammar({
  name: 'github_example',

  conflicts: $ => [],

  rules: {
    source_file: $ => $._expression,

    _expression: $ => choice(
      $.number,
      $.multiply_expression,
      $.addition_expression
    ),

    number: $ => /\d+/,

    _partial_multi: $ => seq(
        $._expression,
        '*',
    ),

    multiply_expression: $ => prec.left(2, seq(
        $._partial_multi,
        $._expression,
    )),

    addition_expression: $ => prec.left(1, seq(
        $._expression,
        '+',
        $._expression,
    )),
  }
});

You can fix this issue by adding precedence to the _partial_multi rule and removing left-associative precedence from the multiply_expression rule:

   _partial_multi: $ => prec(2, seq(
        $._expression,
        '*',
    )),

    multiply_expression: $ => seq(
        $._partial_multi,
        $._expression,
    ),

What you've done here is make multiplication a right-associative operator of precedence 2. This is how you define left- or right-associativity in grammars which don't expose it as a primitive. You can make multiplication left-associative by writing it as follows:

    _partial_multi: $ => prec(2, seq(
        '*',
        $._expression,
    )),

    multiply_expression: $ => seq(
        $._expression,
        $._partial_multi,
    ),

You've actually stumbled across something quite interesting, which is that you don't need explicit language constructs to define precedence & associativity in a grammar! They're just "syntactic sugar" to make the grammar easier to read & write. See this page for more information on how to specify precedence & associativity by decomposing rules. You can see that specifying precedence & associativity purely through grammar constructs is confusing, and almost reads backwards from what you expect unless you think about it! As you also discovered, mixing these two approaches (specifying precedence & associativity through language constructs and grammar constructs) can lead to confusing behavior. It is best to stick with one or the other.