Nearley grammar matches the same bit of text as a terminal and a non-terminal one after the other, producing wrong result

178 views Asked by At

Grammar noob here.

I need to parse math formulas similar to those accepted by SymPy and transform them into some kind of left-to-right syntax tree, using Nearley this grammar.

The problem appears when I have an expression like a*sin(x)*y where sin is first recognised as a SY, and then as a FN. I think this is sort of a necessary evil if I want to be able to parse variables (that's what SY is for). The result is something like

[ { type: 'Symbol',
    properties: { letter: 'a' },
     { right:
        { type: 'Symbol',
          properties: { letter: 'sin' },
           { right:
              { type: 'Fn',
                properties: { name: 'sin' },
                 { argument: { type: 'Symbol', properties: { letter: 'x' }, children: {} },
                   right: { type: 'Symbol', properties: { letter: 'y' }, children: {} } } } } } },
    position: { x: 200, y: 200 } } ]

Even worse, when the expression is a*sin(x)^y, I get

[ { type: 'Symbol',
    properties: { letter: 'a' },
     { right:
        { type: 'Symbol',
          properties: { letter: 'sin' },
           { right:
              { type: 'Fn',
                properties: { name: 'sin' },
                 { argument: { type: 'Symbol', properties: { letter: 'x' }, children: {} },
                   superscript: { type: 'Symbol', properties: { letter: 'y' }, children: {} },
                   right: [Circular] } } } } },
    position: { x: 200, y: 200 } } ]

I presume [Circular] means there's some sort of wicked loop somewhere.

I suspect I could resolve the first issue above hardcoding a check that replaces the SY with the correct FN if the two "match", but I'd rather avoid such a botch-up. I have no clue as to what's happening with the second one -- though I have been on this for a full day and my mind is likely clouded. I'll investigate more once I get to the office today.

Any clues?

EDIT: I managed to "solve" the first issue (Fn as a child of a Symbol of the same name) with a horrible hack. The circular problem remains. I'm investigating, but I am probably going to find another horrible hack. I'd rather see a fix for the grammar rather than for the transformation functions, if at all possible.


There are 0 answers