I'm trying to create a regex, in RE2C's regex format (1), for matching binary literal numbers. They should look like:
- 0b1, 0b101, 0b1111, 0b11_11, 0b1_111, 0b1_1_1_1, etc
The underscore is used as a convenience separator, and it is otherwise ignored when extracting the resulting digits. However, the separator should only be used in between digits (not at beginning nor end), and there should not be 2 or more consecutive underscores.
This is the regex that makes sense to me:
BINARY_NUM = "0b" ("0"|"1") ("_"? ("0"|"1"))*;
I'm trying to say:
- it starts with "0b"
- it is followed by one digit of 0 or 1.
- it is followed by any amount of any of these combinations: "0", "1", "_0", "_1"
However, the regex above also matches a trailing "_". So, these 2 are matched equivalently:
- 0b1_0
- 0b1_0_
How can I prevent the matching of trailing "_"?
Based purely on this documentation, a form of lookahead is supported. So one might hope that this would work:
or slightly more compactly:
although something more sophisticated would be needed, since the example above implies that:
would be parsed as
0b010101010followed by222.However, I discovered "trailing contexts are not allowed in named definitions" when I tried substituting the above into the introductory sample code in the manual.
Modifying it with the "sentinel" example, I get:
This successfully rejects trailing
_.It is also possible to just include the null sentinel directly: