OK, I have tried to rewrite this Bison grammar three times now and keep running into shift/reduce and reduce/reduce conflicts. The Syntax that I am trying to parse is as follows. The items in the {...} are for one or the other. The items i n the [...] are optional.
CONSTANT constant-name constant-class
constant-class = { EQUALS expression numeric-options }
{ EQUALS STRING string string-options }
numeric-options = [ PREFIX prefix-string ]
[ TAG tag-string ]
[ COUNTER #local-name ]
[ TYPENAME type-name ] ;
string-options = [ PREFIX prefix-string ]
[ TAG tag-string ] ;
CONSTANT (constant-name,...) EQUALS expression
[ INCREMENT expression ]
[ PREFIX prefix-string ]
[ TAG tag-string ]
[ COUNTER #local-name ]
[ TYPENAME type-name ];
CONSTANT constant-name EQUALS expression,
.
.
.
;
CONSTANT (constant-name,...) EQUALS expression,
.
.
.
;
I'm having issues with the getting the last three to all work. I can get any one of them to work, but not all 4 of them. I right now have one shift/reduce and one reduce/reduce conflict. The grammar I have is as follows:
constant
: SDL_K_CONSTANT constant_style ';'
;
constant_style
: constant_name constant_class
| constant_list
| constant_set
;
constant_name
: sdl_name
;
constant_class
: SDL_K_EQUALS sdl_decimal
| SDL_K_EQUALS SDL_K_STRING sdl_string
;
constant_names
: constant_name
| constant_names ',' constant_name
| '(' constant_names ')'
;
names_equal
: constant_names SDL_K_EQUALS sdl_decimal
;
constant_list
: names_equal
;
constant_set
: names_equal
| constant_set ',' names_equal
;
I think the names are self documenting (at least you should be able to understand the types). Any help would be greatly appreciated. I have a feeling that I either simplified too much or not enough.
NOTE: I figured out how to edit my post and removed the options and changed SDL_K_COMMA and SDL_K_SEMI to ',' and ';', respectively.
Thanks.
Here are some examples that should parse:
CONSTANT block_node_size EQUALS 24;
CONSTANT Strcon EQUALS STRING "This is a string constant" PREFIX Jg$
#block_size = 24;
CONSTANT block_node_size EQUALS #block_size;
CONSTANT
xyz EQUALS 10,
alpha EQUALS 0,
noname EQUALS 63;
CONSTANT
(zyx, nameless) EQUALS 10,
(beta) EQUALS 1,
gamma EQUALS 42;
CONSTANT (
bits,
bytes,
words,
longs,
quads,
octas
) EQUALS 0 INCREMENT 1 PREFIX ctx$;
CONSTANT
(bad_block,bad_data,,,,
overlay,rewrite) EQUALS 0 INCREMENT 4;
CONSTANT (pli,c,bliss,macro)
EQUALS 4 INCREMENT 4 PREFIX lang$
COUNTER #lang;
CONSTANT (basic,pascal,fortran)
EQUALS #lang + 4 INCREMENT 4 PREFIX lang$;
I hope this helps.
BTW: Here is the EBNF (sort of) for this:
/*
* Define the CONSTANT construct (Left out Expression).
*/
Str ::= "\"" Printable* "\""
Name ::= "$" | "_" | [A-Za-z] ("$" | "_" | [A-Z0-9a-z])*
Names ::= Name | ("(" Name ("," Name )* ")")
Constant_class ::= "EQUALS" (Expression Numeric_options | "STRING" Str
String_options)
String_options ::= Prefix? Tag?
Numeric_options ::= String_options Counter? Typename? Radix?
Increment_options ::= Increment? Numeric_options
Constant_list ::= Names "EQUALS" Expression Increment_options
Constant_set ::= Names Equals Expression
("," Names "EQUALS" Expression)*
Constant ::= "CONSTANT" (Name Constant_class | Constant_list |
Constant_set) ";"?
Prefix ::= "PREFIX" Str
Tag ::= "TAG" Str
Radix ::= "RADIX" ("DEC" | "OCT" | "HEX")
Counter ::= "COUNTER" Variable
Increment ::= "INCREMENT" Expression
Typename ::= "TYPENAME" Name
I think that's about it.
It's a little hard for me to understand what you are actually trying to do, so I've made some assumptions and provided some alternatives below. I hope that it comes close.
The basic issue is ambiguity in your grammar specifications. One of these might just be a mistake: according to your templates, the left-hand hand side of the
EQUAL
seems to be either a singlename
or a comma-separated list ofname
s surrounded by parentheses. However, your grammar allows a comma-separated list ofname
s, with the possibility that the first (or only) item in the list could be a parenthesized lists of names:That will match
a
,a, b
,(a, b)
,(a, b), c
and(a, b), c, d
. BUt I think only the first and third of those are actually intended.In any event, you have two productions:
For the first one, we have:
while for the second, we have:
Since
constant_name
can match a singlename
, there are two different ways to matchanswer = 42
, which will inevitably lead to a parsing conflict.But there's yet another possible parse for
answer = 42
:So let's start by simplifying all that stuff, and then we can maybe get back to what (might be) your original goal. The idea is to factor everything which is syntactically similar:
I hope that is all simple enough to understand.
But we can see from the original that (at least in certain circumstances), there are many more options for
right-hand-side
than appear in the above snippet. It would be trivial to just add the other syntaxes toright-hand-side
. However, it appears that the intention is that these extra options are only available in the case that there is a single clause. In that case, we could do something like this:But unfortunately, that is back to being ambiguous because
numeric-options
might be empty, soexpression
will match bothright-hand-side
andcomplex-right-hand-side
.In practice, this ambiguity doesn't matter. There is no semantic difference between the declaration
name EQUALS expression
as the only declaration in aCONSTANT
declaration or as one of a list of such declarations. So one possibility is to just ignore the resulting reduce/reduce conflict, possibly by putting%expect 1
into the file. But that's not very pleasant really. So we will try to eliminate the problem.One way would be to insist that the first
complex-rhs
have at least onenumeric-option
. But that's going to be truly annoying. First, we'd have to create yet another clause type --first-complex-clause
or some such -- and second, we'd have to write out the requirement that at least one option be present.As a simpler alternative, we can just require that a
clause-list
have at least twoclause
s, rather than just one. Since everyclause
can also matchcomplex-clause
, the restriction onclause-list
will not reject any validclause
. So that gives us the very minor change:Addendum
With a more precise description of the intended syntax (although it is still missing some details), I modified the above grammar to attempt to parse the full language. The basic principle remains intact: a definition consists of a single
complex-clause
(which includes the simple clause case), or a list of at least two simpleclause
s. The only difference is the way the two clause types are defined.I also fixed the name-list production so that individual names can be omitted (eg.
(bad_block,bad_data,,,,overlay,rewrite)
). As indicated in the guide, the list must contain at least one name, which slightly complicates the grammar.I added the options from the guide (but not the extra ones in the EBNF). I did not try to deal with omitted semicolons, which don't seem to be allowed in the guide although there is an example of a declaration without a trailing semicolon. (That could be a typo.) I added the definition of
expression
, to the best of my understanding.I also added the local-name assignment statement because it was in the test file and it was easy.
Here's the grammar:
You'll see that I added a discrimination of various types of quoted strings so that it is possible to deal with (most of) the different contexts in which quoted strings may appear. I didn't include the use of quoted strings of up to 4 characters as integer constants, as indicated in the guide, because I didn't notice it in time, and also because I couldn't figure out with a brief reading whether the intent is to allow constants in expressions whose names must be quoted (because the name conflicts with a keyword); in that case, it seems to me that there is an ambiguity with a quoted name with four or fewer characters.
In case it's not obvious how the discrimination works (and it probably isn't), here's the accompanying flex definition:
I did a rather inadequate test by running it with the definitions at the end of the OP (although I added a trailing
;
in the second line). I didn't actually check that the parse tree was correct, but it certainly does all pass through the parser.