How should a unary operator be applied in an expression grammar?

605 views Asked by At

I have a grammar that I've implemented in C#. My question relates to evaluating expressions. A cutdown version of the grammar just relating to expressions is. Note that the grammar also supports logical operations (using AND and OR, rather than && and ||):

Expression => [ NegOp ] <Term> { <AddOp> <Term> }
Term =>       <Power> { <MulOp> <Power> }
Power =>      <Factor> [ ^ <Factor> ]
Factor =>     <Numeric> | ( OPENPAREN <Expression> CLOSEPAREN )
Numeric =>    <Float> | <Integer>
NegOp =>      + | - | NOT
AddOp =>      + | - | OR
MulOp =>      * | / | % | AND

I've implemented Left to Right evaluation of expressions, so that, for instance:

5 / 6 / 7

is interpreted as:

(5 / 6) / 7

However, I'm unclear how to apply the NegOp to the Expression. If I have the Expression:

5 - 4 * 3    => -7

and I prefix it with a minus sign, thus:

-5 - 4 * 3

should this be interpreted as:

-(5 - 4 * 3)      => 7

or:

(-5) - 4 * 3      => -17

i.e. Should the NegOp bind tightly to the first Term in the Expression, or apply to the Expression as a whole after evaluation?

1

There are 1 answers

0
rici On

Normally, unary prefix operators are located almost at the top of the operator precedence chain, just below postfix operators. One way to do that would be to add them to what you call Factor:

Factor =>     <Numeric> | OPENPAREN <Expression> CLOSEPAREN | NegOp Factor

That could be written as { NegOp } ( <Numeric> | OPENPAREN <Expression> CLOSEPAREN ) but I feel that using "a repetition of NegOp" inappropriately separates the operator(s) from the operand. Also, some people might be inclined to use a single optional [ NegOp ] instead of arbitrary repetition, but that's seems hard to justify. Is there any good reason to disallow the use of two different unary operators on the same operand? But those are decisions you'll have to make.

However, in the specific case of unary minus in combination with exponentiation, it is moderately common but not universal to give exponentiation priority, since you would practically never want -2^20 to mean (-2)^20, which is mathematically the same as 2^20. I'd be inclined to go with the majority view on this, but both decisions are legitimate.

Exponentation itself is usually right-associative, unlike other mathematical operators, again because the other grouping doesn't add anything useful. (a^b)^c is exactly the same as a^(b*c), and would usually be better computed as the second; the expression you normally want is a^(b^c). I don't know of any commonly-used language with exponentiation in which exponentation groups to the left. Your decision to make exponentiation non-grouping also seems to me eccentric. But it's your language.


It's maybe also worth adding that not all prefix operators have naturally high precedence. But it depends on what you mean by "operator".

For example, most languages have constructs like

return x + y
assert i < n

These might look more like "statements" than "prefix operators", but syntactically the difference is minor. It's true that they don't return values, but there are variants which do, such as Python's yield operator.

Also, some languages have constructs like

lambda t: 42 * t         # There are many ways to spell lambda
let t = f(a) in t + 6

These are both bracketed prefix operators (in the same way that the subscript index operator v[i] is a bracketed postfix operator or that the C family's "ternary operator" is really a bracketed infix operator with low precedence. The bracketed part does contain an operand, but it's effectively fully parenthesized so you don't need to worry about it during the parse. So again they are syntactically the same as a prefix operator.

All of these operators are at (or close to) the bottom of the precedence hierarchy; they basically take "the rest of the subexpression" to be their argument. (In other words, the argument continues until terminated with a close parenthesis or whatever passes for a statement delimiter in the language. So if you have those things, your precedence hierarchy will have prefix operators close to both ends of the precedence list. But that's fine. The parse still works.