reduce/reduce conflict with untyped variables and function calls

163 views Asked by At

i want to create a parser for a dynamically typed language.

in my bison file i have a rule for runtimetyped which is a variable name or a function call.

runtimetyped : T_ID { $$ = create_identifier($1); }
             | call { $$ = $1; }
             ;

i also want to do some basic type checking at compile time. f.e. i don't want to allow things like

x = "string" + 42 <= true;

in the source code, there i want to create a compile time error.

but stuff like

s = "string";
i = 42;
b = true;
x = s + i <= b;

should produce a run time error.

my approach was to have different expressions in the grammar:

expression : bool_expression
           | math_expression
           | string_expression
           ;

and any of these expressions are build up by terms, factors, and so on.
a factor can always also be a runtimetyped which causes reduce/reduce errors.

math_factor : numeric_literal                   { $$ = $1; }
            | runtimetyped                      { $$ = $1; }
            | T_LPAREN math_expression T_RPAREN { $$ = $2; }
            ;

bool_factor : T_BOOL                            { $$ = create_bool($1); }
            | runtimetyped                      { $$ = $1; }
            | compare                           { $$ = $1; }
            | T_LPAREN bool_expression T_RPAREN { $$ = $2; }
            ;

string_expression : T_STRING                                    { $$ = $1; }
                  | runtimetyped                                { $$ = $1; }
                  | string_expression T_STROP string_expression { $$ = create_expression($2, $1, $3); }
                  ;

i run it with bison -v parser.y.

can anyone give me a hint on how this conflict can be resolved and/or what exactly produces the conflict.

thanks in advance.

1

There are 1 answers

1
rici On BEST ANSWER

The best way to type check during compilation is to do an analysis pass over the AST. In order to provide accurate error messages, you'll need to keep location information for every token in the AST, but that's generally useful anyway.

The advantage of doing the semantic analysis after building the AST is that the code is much cleaner, because it doesn't mix semantic analysis with other tasks. It also allows you to make use of more information, such as type declarations. However, if you don't want to do that, you could also do type checking in the actions for each production. That spreads the semantic analysis over the entirety of the grammar, which IMHO is harder to understand, verify, test and maintain. Still, it is possible.

Translating the semantic check into a syntax error is really the worst way to do things. It needlessly complicates the grammar, and makes it harder to generate good error messages, because a type error really is not a syntax error and most people attempting to write code in your language will be puzzled by receiving a syntax error for a syntactically correct but semantically meaningless construct.

Nonetheless, it is possible. However you need to take considerable care to avoid grammatical ambiguities, which will generally show up as reduce/reduce conflicts, which is precisely what you are seeing. (You haven't provided enough of the grammar to diagnose the precise problem, but you can probably do it yourself by processing the grammar with the -v flag to bison and then examining the resulting .output file which will show you the states in which there are conflicts.)

The most likely cause of the conflicts is the resolution of unit productions, in a context where both x_expression and y_expression might be possible (without looking at the lookahead token). Here you might need to perform one of the productions x_expression: x_factor or y_expression: y_factor, which in turn means that you might need one of x_factor: runtimetyped or y_factor: runtimetyped and it may not be possible to make that determination. (This is one of the cases where the LALR(1) state-merging can create "mysterious" conflict.)