Bison shift/reduce and reduce/reduce conflict

243 views Asked by At

OK, I have tried to rewrite this Bison grammar three times now and keep running into shift/reduce and reduce/reduce conflicts. The Syntax that I am trying to parse is as follows. The items in the {...} are for one or the other. The items i n the [...] are optional.

CONSTANT constant-name constant-class 
    constant-class = { EQUALS expression numeric-options } 
                     { EQUALS STRING string string-options } 
    numeric-options = [ PREFIX prefix-string ] 
                      [ TAG tag-string ]  
                      [ COUNTER #local-name ]
                      [ TYPENAME type-name ] ; 
    string-options = [ PREFIX prefix-string ] 
                     [ TAG tag-string ] ; 

CONSTANT (constant-name,...) EQUALS expression 
                 [ INCREMENT expression ] 
                 [ PREFIX prefix-string ] 
                 [ TAG tag-string ] 
                 [ COUNTER #local-name ] 
                 [ TYPENAME type-name ]; 

CONSTANT  constant-name EQUALS expression, 
                . 
                . 
                . 
 ;

CONSTANT  (constant-name,...) EQUALS expression, 
                . 
                . 
                . 
 ; 

I'm having issues with the getting the last three to all work. I can get any one of them to work, but not all 4 of them. I right now have one shift/reduce and one reduce/reduce conflict. The grammar I have is as follows:

constant
    : SDL_K_CONSTANT constant_style ';'
    ;

constant_style
    : constant_name constant_class
    | constant_list
    | constant_set
    ;

constant_name
    : sdl_name
    ;

constant_class
    : SDL_K_EQUALS sdl_decimal
    | SDL_K_EQUALS SDL_K_STRING sdl_string
    ;

constant_names
    : constant_name
    | constant_names ',' constant_name
    | '(' constant_names ')'
    ;

names_equal
    : constant_names SDL_K_EQUALS sdl_decimal
    ;

constant_list
    : names_equal
    ;

constant_set
    : names_equal
    | constant_set ',' names_equal
    ;

I think the names are self documenting (at least you should be able to understand the types). Any help would be greatly appreciated. I have a feeling that I either simplified too much or not enough.

NOTE: I figured out how to edit my post and removed the options and changed SDL_K_COMMA and SDL_K_SEMI to ',' and ';', respectively.

Thanks.

Here are some examples that should parse:

CONSTANT block_node_size EQUALS 24;
CONSTANT Strcon EQUALS STRING "This is a string constant" PREFIX Jg$ 
#block_size = 24;
CONSTANT block_node_size EQUALS #block_size;
CONSTANT 
    xyz EQUALS 10, 
    alpha EQUALS 0, 
    noname EQUALS 63;
CONSTANT
    (zyx, nameless) EQUALS 10,
    (beta) EQUALS 1,
    gamma EQUALS 42;
CONSTANT ( 
    bits, 
    bytes, 
    words, 
    longs, 
    quads, 
    octas 
    ) EQUALS 0 INCREMENT 1 PREFIX ctx$;
CONSTANT 
    (bad_block,bad_data,,,, 
    overlay,rewrite) EQUALS 0 INCREMENT 4;
CONSTANT (pli,c,bliss,macro) 
    EQUALS 4 INCREMENT 4 PREFIX lang$ 
    COUNTER #lang;
CONSTANT (basic,pascal,fortran) 
    EQUALS #lang + 4 INCREMENT 4 PREFIX lang$;

I hope this helps.

BTW: Here is the EBNF (sort of) for this:

/*
 * Define the CONSTANT construct (Left out Expression).
 */
Str                 ::= "\"" Printable* "\""
Name                ::= "$" | "_" | [A-Za-z] ("$" | "_" | [A-Z0-9a-z])*
Names               ::= Name | ("(" Name ("," Name )* ")")
Constant_class      ::= "EQUALS" (Expression Numeric_options | "STRING" Str 
                        String_options)
String_options      ::= Prefix? Tag?
Numeric_options     ::= String_options Counter? Typename? Radix?
Increment_options   ::= Increment? Numeric_options
Constant_list       ::= Names "EQUALS" Expression Increment_options
Constant_set        ::= Names Equals Expression
                        ("," Names "EQUALS" Expression)*
Constant            ::= "CONSTANT" (Name Constant_class | Constant_list | 
                        Constant_set) ";"?
Prefix              ::= "PREFIX" Str
Tag                 ::= "TAG" Str
Radix               ::= "RADIX" ("DEC" | "OCT" | "HEX")
Counter             ::= "COUNTER" Variable
Increment           ::= "INCREMENT" Expression
Typename            ::= "TYPENAME" Name

I think that's about it.

1

There are 1 answers

9
rici On BEST ANSWER

It's a little hard for me to understand what you are actually trying to do, so I've made some assumptions and provided some alternatives below. I hope that it comes close.

The basic issue is ambiguity in your grammar specifications. One of these might just be a mistake: according to your templates, the left-hand hand side of the EQUAL seems to be either a single name or a comma-separated list of names surrounded by parentheses. However, your grammar allows a comma-separated list of names, with the possibility that the first (or only) item in the list could be a parenthesized lists of names:

constant_names
    : constant_name
    | constant_names ',' constant_name
    | '(' constant_names ')

That will match a, a, b, (a, b), (a, b), c and (a, b), c, d. BUt I think only the first and third of those are actually intended.

In any event, you have two productions:

constant_style
    : constant_name constant_class
    | constant_list

For the first one, we have:

constant_class
    : SDL_K_EQUALS sdl_decimal

while for the second, we have:

constant_list: names_equal
names_equal
    : constant_names SDL_K_EQUALS sdl_decimal

Since constant_name can match a single name, there are two different ways to match answer = 42, which will inevitably lead to a parsing conflict.

But there's yet another possible parse for answer = 42:

constant_set
    : names_equal

So let's start by simplifying all that stuff, and then we can maybe get back to what (might be) your original goal. The idea is to factor everything which is syntactically similar:

constant-stmt  : "CONSTANT" clause-list ';'
clause-list    : clause | clause-list ',' clause
clause         : left-hand-side "EQUALS" right-hand-side
left-hand-side : name | '(' name-list ')'
name-list      : name | name-list ',' name
right-hand-side: expression   /* See below */

I hope that is all simple enough to understand.

But we can see from the original that (at least in certain circumstances), there are many more options for right-hand-side than appear in the above snippet. It would be trivial to just add the other syntaxes to right-hand-side. However, it appears that the intention is that these extra options are only available in the case that there is a single clause. In that case, we could do something like this:

constant-stmt  : "CONSTANT" constant-body ';'
constant-body  : complex-clause | clause-list
clause-list    : clause | clause-list ',' clause
clause         : left-hand-side "EQUALS" right-hand-side
right-hand-side: expression
complex-clause : left-hand-side "EQUALS" complex-rhs
complex-rhs    : expression numeric-options
               | "STRING" string-literal string-options

But unfortunately, that is back to being ambiguous because numeric-options might be empty, so expression will match both right-hand-side and complex-right-hand-side.

In practice, this ambiguity doesn't matter. There is no semantic difference between the declaration name EQUALS expression as the only declaration in a CONSTANT declaration or as one of a list of such declarations. So one possibility is to just ignore the resulting reduce/reduce conflict, possibly by putting %expect 1 into the file. But that's not very pleasant really. So we will try to eliminate the problem.

One way would be to insist that the first complex-rhs have at least one numeric-option. But that's going to be truly annoying. First, we'd have to create yet another clause type -- first-complex-clause or some such -- and second, we'd have to write out the requirement that at least one option be present.

As a simpler alternative, we can just require that a clause-list have at least two clauses, rather than just one. Since every clause can also match complex-clause, the restriction on clause-list will not reject any valid clause. So that gives us the very minor change:

constant-body  : complex-clause | clause-list
clause-list    : clause ',' clause | clause-list ',' clause

Addendum

With a more precise description of the intended syntax (although it is still missing some details), I modified the above grammar to attempt to parse the full language. The basic principle remains intact: a definition consists of a single complex-clause (which includes the simple clause case), or a list of at least two simple clauses. The only difference is the way the two clause types are defined.

I also fixed the name-list production so that individual names can be omitted (eg. (bad_block,bad_data,,,,overlay,rewrite)). As indicated in the guide, the list must contain at least one name, which slightly complicates the grammar.

I added the options from the guide (but not the extra ones in the EBNF). I did not try to deal with omitted semicolons, which don't seem to be allowed in the guide although there is an example of a declaration without a trailing semicolon. (That could be a typo.) I added the definition of expression, to the best of my understanding.

I also added the local-name assignment statement because it was in the test file and it was easy.

Here's the grammar:

definition      : %empty
                | definition constant-defn
                | definition local-assign
local-assign    : LOCAL_NAME '=' expression ';'
constant-defn   : "CONSTANT" constant-body ';'
constant-body   : complex-clause | clause-list
clause-list     : clause ',' clause | clause-list ',' clause
clause          : name "EQUALS" expression
                | name-list "EQUALS" expression
complex-clause  : name "EQUALS" expression numeric-options
                | name-list "EQUALS" expression list-options
                | name "EQUALS" "STRING" string-literal string-options
                | name-list "EQUALS" "STRING" string-literal string-options
name-list       : '(' names ')'
names           : optional-commas name | names ',' name | names ','
optional-commas : %empty | optional-commas ',' 
string-options  : prefix-option tag-option
numeric-options : string-options counter-option typename-option
list-options    : increment-option numeric-options
increment-option: %empty | "INCREMENT" expression
prefix-option   : %empty | "PREFIX" prefix-name
tag-option      : %empty | "TAG" tag-name
counter-option  : %empty | "COUNTER" LOCAL_NAME
typename-option : %empty | "TYPENAME" name

expression      : '-' expression %prec UNOP
                | expression '*' expression
                | expression '/' expression
                | expression '+' expression
                | expression '-' expression
                | expression '@' expression
                | expression '&' expression
                | expression '!' expression
                | '(' expression ')'
                | INTEGER
                | IDENT
                | LOCAL_NAME
name            : IDENT | QUOTED_IDENT
prefix-name     : IDENT | QUOTED_NULL | QUOTED_IDENT
tag-name        : IDENT | QUOTED_NULL | QUOTED_IDENT | QUOTED_TAG
string-literal  : QUOTED_NULL | QUOTED_IDENT | QUOTED_TAG | QUOTED_STRING

You'll see that I added a discrimination of various types of quoted strings so that it is possible to deal with (most of) the different contexts in which quoted strings may appear. I didn't include the use of quoted strings of up to 4 characters as integer constants, as indicated in the guide, because I didn't notice it in time, and also because I couldn't figure out with a brief reading whether the intent is to allow constants in expressions whose names must be quoted (because the name conflicts with a keyword); in that case, it seems to me that there is an ambiguity with a quoted name with four or fewer characters.

In case it's not obvious how the discrimination works (and it probably isn't), here's the accompanying flex definition:

id   [[:alpha:]$_][[:alnum:]$_]*

%%

[[:space:]]+         ;

CONSTANT             { return K_CONSTANT; }
COUNTER              { return K_COUNTER; }
EQUALS               { return K_EQUALS; }
INCREMENT            { return K_INCREMENT; }
PREFIX               { return K_PREFIX; }
STRING               { return K_STRING; }
TAG                  { return K_TAG; }
TYPENAME             { return K_TYPENAME; }

[#]{id}              { yylval.str = strndup(yytext, yyleng); return LOCAL_NAME; }
{id}                 { yylval.str = strndup(yytext, yyleng); return IDENT; }
["]{id}["]           { yylval.str = strndup(yytext+1, yyleng-2); return QUOTED_IDENT; }

["]["]               { yylval.str = strndup(yytext+1, yyleng-2); return QUOTED_NULL; }
["][[:alnum:]$_]+["] { yylval.str = strndup(yytext+1, yyleng-2); return QUOTED_TAG; }
["][[:print:]]+["]   { yylval.str = strndup(yytext+1, yyleng-2); return QUOTED_STRING; }

[[:digit:]]+         { yylval.dec = strtoul(yytext, 0, 0); return INTEGER; }
.                    { return *yytext; }

I did a rather inadequate test by running it with the definitions at the end of the OP (although I added a trailing ; in the second line). I didn't actually check that the parse tree was correct, but it certainly does all pass through the parser.