I would like to create a grammar rule for a printable character (any character which returns true
using C isprint()
function.
For this purpose i created the following regex rule inside my lex file:
[\x20-\x7E] { yylval.ch = strdup(yytext); return CHARACTER; }
The regular expression contains all the printable characters based on their ASCII hexadecimal value.
On my first attempt this rule was located in the bottom, but any printable character that was already stated before obviously wasn't included, for example if my input was the character '+'
and i had a previous rule:
"+" { return PLUS_OPERATOR; }
The parser accepted it as a PLUS_OPERATOR
and not as CHARACTER
.
Than i tried to place the character rule on top of my scanner, and from the same reason as before - all the following rules with characters in the printable range could not be matched.
My question is what can i do to create a rule that will match all printable characters but also rules for specific characters.
The only thing that i can think of is to putt it on the bottom and use a grammar rule with all one-character regular expression rules and the character rule (ex. CHAR : PLUS_OPERATOR | MINUS_OPERATOR | EQUAL_OPERATOR | CHARACTER
)
I have a lot more than 3 one character rules in my lex file so obviously i'm looking for a more elegant solution.
The only solution is the one you propose: create a non-terminal which is the union of all the relevant terminals.
Personally, I find grammars much more readable if single-character tokens are written as themselves, so I would write:
in the bison file, and in the scanner:
(which in turn requires the semantic type to be a union with both char and char* fields; the advantage is that you don't need to worry about freeing the strings created for operator characters.)
That is about as elegant as it gets, I'm afraid.