I'm in the midst of writing a lexical scanner, and I'm wondering how I would distinguish between an operation (eg. -
) and a signed number (eg. -14
). For example, both of the following lines are valid:
+12
12 +12
Currently, my lexical scanner would parse them like so:
+12
12
+12
However, when checking the second statement's validity later in the program, it is flagged as invalid: one numeric token can't be followed by another without an adjoining operator. I would like them scanned as:
+12
12
+
12
I could implement this by simply checking whether the preceding character is an operator, generating a signed-number token if so, else an operator followed by a number, but doing so would be against the rules of context free grammars and would drastically increase the complexity of my scanner.
How might I scan signed numbers in an unambiguous way that correctly determines what is and isn't part of a numeric token?
Most scanners I've dealt with treat signs as an operator. So, -12 isn't just an integer literal, it's an integer literal and a unary sign operator. I think that would solve your problem while keeping your scanner simple (it just has to recognize + and - as tokens; your parser can work out which semantics to use later).