bison shift instead of reduce. With reduce/reduce errors

1.2k views Asked by At

In my language i can write

a = 1

b = 2
if true { } else { }
if true { } **Here is the problem**
else {}

My grammer doesnt support newlines between statements. An else can only be used with an if. When i add optionalNL in my rule

IfExpr:
  IF rval optionalNL codeBlock optionalNL ELSE codeBlock
| IF rval optionalNL codeBlock

The optionalNL before the else causes 3 reduce/reduce. Reason is it can reduce using the 2nd rule in IfExpr or reduce to exprLoop where it allows many newlines between expressions.

No matter what i do (i tried writing %prec before optionalNL and ELSE) it always reduces to exprLoop which cases bison to give me a synax error on else. How do i tell bison to shift at this point (to optionalNL else) instead of reduce? (to exprLoop causing else to be an error).

example file to test with

%%
program:
      exprLoop;
exprLoop:
      exprLoop2 expr
    | exprLoop2
exprLoop2:
    | exprLoop2 expr EOS
    | exprLoop2 EOS
    ;   
expr:
      'i' Var optEOS '{' '}'
    | 'i' Var optEOS '{' '}' optEOS 'e' '{' '}'
EOS: '\n'   ;
Var: 'v';
optEOS: | optEOS EOS

%%

//this can be added to the lex file
[iev]                   { return *yytext; }

y.output http://www.pastie.org/707448

Alternative .y and output. You can see it looking ahead seeing a \n and doesnt know to reduce the rule or keep going. I change change the order of the rules to get different results. But it either always expects a \n or always expects an else thus one rule always end up being ignore. state 15

    9 expr: 'i' Var optEOS '{' '}' .  [$end, '\n']
   10     | 'i' Var optEOS '{' '}' . 'e' '{' '}'
   11     | 'i' Var optEOS '{' '}' . '\n' 'e' '{' '}'

    'e'   shift, and go to state 16
    '\n'  shift, and go to state 17

    '\n'      [reduce using rule 9 (expr)]
    $default  reduce using rule 9 (expr)

Thanks to Kinopiko for his answer

I changed his code to have no conflicts then worked on making it more flexible. Heres are my files

test.y

%{
#include <stdio.h>
%}

%%

program: expr                                   { printf ("First expr\n"); }
       | program expr                           { printf ("Another expr\n"); }

expr:
      if optEOS                                 { printf ("IF only\n"); }
    | if optEOS else optEOS                     { printf ("IF/ELSE\n"); }

if:   'i' Var optEOS '{' optEOS '}'
else: 'e' optEOS     '{' optEOS '}'
EOS:  '\n'
Var:  'v'
optEOS:
          | EOS optEOS                          { ;}//printf ("many EOS\n"); }
%%

int main(int argc, char **argv)
{
    int i;

    printf("starting\n");

    if(argc < 2) {
        printf("Reading from stdin\n");
        yyparse();
        return 0;
    }
    for(i = 1; i < argc; i++) {
        FILE *f;
        char fn[260];
        sprintf(fn, "./%s", argv[i]);
        f = fopen(fn, "r");
        if(!f) {
            perror(argv[i]);
            return (1);
        }
        printf("Running '%s'\n", argv[i]);
        yyrestart(f);
        yyparse();
        fclose(f);
        printf("done\n");
    }
    return 0;
}

test.y

%{
#include <stdio.h>
#include "y.tab.h"
%}    
%option noyywrap
%%
[ \t]               { }
\n                  { return *yytext; }
.                   { return *yytext; }
%%
int yyerror ()
{
    printf ("syntax error\n");
    exit (1);
}

a test file that auto ran after compiling

i v { } 
i v { }
e { }
i v { }

e { }
i v { 
} e {
 }
i v { }


i v { } i v { } e { }

i v
{ } i v { } e { } i v { } e { 
} i v {
 } e 
{ }
4

There are 4 answers

6
AudioBubble On BEST ANSWER

I don't understand your problem very well, so I started from scratch:

This is my grammar:

%{
#include <stdio.h>
%}

%%

program: expr                                   { printf ("First expr\n") }
       | program EOS                            { printf ("Ate an EOS\n") }
       | program expr                           { printf ("Another expr\n") }

expr:
      ifeos                                     { printf ("IF only\n"); }
    | ifelse                                    { printf ("IF/ELSE\n"); }

ifelse: ifeos else
      | if else

ifeos: if EOS
     | ifeos EOS

if:   'i' Var optEOS '{' '}'
else: 'e' '{' '}'
EOS:  '\n'
Var:  'v'
optEOS:
          | EOS optEOS                          { printf ("many EOS\n") }
%%

Here is the lexer:

%{
#include <stdio.h>
#include "1763243.tab.h"
%}    
%option noyywrap
%%
[iev\{\}\n]                  { return *yytext; }
\x20                         { }
%%
int yyerror ()
{
    printf ("syntax error\n");
    exit (1);
}
int main () {
    yyparse ();
}

Here is some test input:

i v { } 
i v { }
e { }
i v { }

e { }
i v { } e { }
i v { }

Here is the output:

IF only
First expr
IF/ELSE
Another expr
Ate an EOS
IF/ELSE
Another expr
Ate an EOS
IF/ELSE
Another expr
Ate an EOS
IF only
Another expr

There is a shift/reduce conflict remaining.

1
Simeon Pilgrim On

According to 'Lex & Yacc' the default resolution of the reduce/reduce is the first defined rule, so as you say the exprLoop wins, so I'll assume it is defined first.

But switching the order may not solve the problem how you expect.

Further reading (page 237) it appears that you need more look ahead, which is not an option for standard yacc/bison. But Bison does have a GLR mode, which may be of use.

1
DigitalZebra On

One thing you can do is parse out newlines completely using a lex rule for them. This way, it doesn't matter where the newlines are. This is what C/C++ do... newlines are largely ignored.

1
Chris Dodd On

The problem is that:

IfExpr:
  IF rval optionalNL codeBlock optionalNL ELSE codeBlock
| IF rval optionalNL codeBlock

requires two-token lookahead after the codeblock to see the 'else' after the newline if that's what there is. You can avoid this by duplicating the optionalNL in both if rules:

IfExpr:
  IF rval optionalNL codeBlock optionalNL ELSE codeBlock
| IF rval optionalNL codeBlock optionalNL

Now the parser doesn't have to decide between the two rules until after the optionalNL is parsed, letting it see the ELSE (or its lack) in the one-token lookahead.

A potential drawback here is that the second if rule (but not the first) will now absorb any trailing newlines, so if your grammar for programs requires a newline between each statement, it won't find one after ifs without elses and its already been consumed.