Creating a rule for a printable character in lex/yacc

1.3k views Asked by At

I would like to create a grammar rule for a printable character (any character which returns true using C isprint() function.

For this purpose i created the following regex rule inside my lex file:

[\x20-\x7E] { yylval.ch = strdup(yytext); return CHARACTER; }

The regular expression contains all the printable characters based on their ASCII hexadecimal value.

On my first attempt this rule was located in the bottom, but any printable character that was already stated before obviously wasn't included, for example if my input was the character '+' and i had a previous rule:

"+" { return PLUS_OPERATOR; }

The parser accepted it as a PLUS_OPERATOR and not as CHARACTER.

Than i tried to place the character rule on top of my scanner, and from the same reason as before - all the following rules with characters in the printable range could not be matched.

My question is what can i do to create a rule that will match all printable characters but also rules for specific characters.

The only thing that i can think of is to putt it on the bottom and use a grammar rule with all one-character regular expression rules and the character rule (ex. CHAR : PLUS_OPERATOR | MINUS_OPERATOR | EQUAL_OPERATOR | CHARACTER)

I have a lot more than 3 one character rules in my lex file so obviously i'm looking for a more elegant solution.

1

There are 1 answers

0
rici On BEST ANSWER

The only solution is the one you propose: create a non-terminal which is the union of all the relevant terminals.

Personally, I find grammars much more readable if single-character tokens are written as themselves, so I would write:

printable: '+' | '-' | '=' | CHAR

in the bison file, and in the scanner:

[-+=]        { yylval.ch = yytext[0]; return yylval.ch; } 
[[:print:]]  { yylval.ch = yytext[0]; return CHAR; }

(which in turn requires the semantic type to be a union with both char and char* fields; the advantage is that you don't need to worry about freeing the strings created for operator characters.)

That is about as elegant as it gets, I'm afraid.