What are the Bison/yacc grammars for these

348 views Asked by At

I am trying to learn Flex/Bison and the world if full of calculator examples where everything is an expression; I am trying a little more: to introduce the "var" keyword; so I got stuck. Here what I am trying to parse:

var x;
x = 3 + 4;
print x;

and later to complicate it with:

var x = 2+3, y = x+5, xyz;
xyz = x + y + 3;
print xyz;

Is "var x =" an expression like 2+3? Or ", y = " also an expression?

Edited - Added extra info:

I am at the very, very beginning:

%union 
{
    char *chars;
}

%token TOKEN_VAR
%token <chars> TOKEN_LITERAL
%token ';' TOKEN_SEMICOLON 

%%
input
: varStatement {;}
;

varStatement 
: TOKEN_VAR TOKEN_LITERAL TOKEN_SEMICOLON {AddStatement(new VarStatement($2));}
;

%%

Trying to parse: "var xz; var abc;" I have 2 problems:

  • the $2 is always null
  • the parser stops after var xz;
2

There are 2 answers

0
rici On BEST ANSWER

I don't think StackOverflow is the proper forum for a complete introduction to writing context-free grammars, but perhaps this is enough to get you started.

A grammar consists of a number of rules, each of which has the form "An X can be an a followed by a b followed by …". These rules can be recursive, and that is the only way to express concepts like arbitrary repetition. In other words, since we can't say "A list is any number of _expression_s separated by _Comma_s", what we say instead is: "A list can be an expression, or it can be a list followed by a Comma followed by an expression." We usually write that as follows:

list: expression            /* A list can be an expression */
    | list ',' expression   /* or a list, a comma, and an expression */

The | is just an abbreviation. We could have written:

list: expression            /* A list can be an expression */
list: list ',' expression   /* or a list, a comma, and an expression */

Note the use of ',' to represent "a comma". In bison, you don't have to think up names for tokens which consist of a single character; you can simply use the character itself in quotes. (In flex, to return that token, you do the same thing: { return ','; }. That works because in C, a single-quoted character is an integer constant.)

For multicharacter tokens -- keywords like var, for example -- bison lets you use a double-quoted string provided you have declared a token name. So for example, you could write:

/* Declare the token name and the string representation */
 %token TOKEN_VAR "var"

 %%
 var_statement: "var" declaration_list ';'
 declaration_list: declaration
                 | declaration_list ',' declaration
 declaration: IDENTIFIER
            | IDENTIFIER '=' expression 

Now, a "program" is also a list of statements, but since the statements are terminated by semicolons, we don't need any punctuation in the list. That's a trivial difference:

 program: statement
        | program statement
 statement: var_statement
          | print_statement
          | assignment_statement

There's a lot more to fill in here (expression, for example), but hopefully that gives you some ideas.

0
CoffeDeveloper On

It is HARD giving a particular bison/yacc configuration without more constraints or a formal grammar. In particular YOU DECIDE if "var x=" should be an expression (that evaluates as "x" and as side effect add "x" to the environment) or as a statement (evaluated without return type) or something else.

In general, you better try to write yourself few simple parsers without the support of code generation tools. A possible grammar for your language is:

Statement ::= ExpStat | VarStat | PrintStat
VarStat ::= var InitializerList;
InitializerList ::= (Ident | Ident = Exp )( ,Ident | ,Ident = Exp)*
ExpStat ::= Exp
PrintStat ::= Print Exp;
Exp ::= Exp BinaryOp Exp | Number | Ident
BinaryOp ::= + | =
Number ::= [1-9]+[0-9]*
Ident ::= [a-z]*

Where you add the additional constraint that number must be greater or equal than 0 and lesser or equal than 2^32-1 and the binary operation + is in example "left-associative", while binary operation "=" is right-associative (not saying that is a good idea, just saying it is one of many possible chances).

You can probably code the parser, the typechecker and the evaluator of such grammar in one afternoon or two (and maybe a virtual machine to speedup the whole in a week), but first you need to have a PRECISE idea of how your language should work. Code generation tools are noteworthy to not be the silver bullet, in particular if you are not able to write your own parser properly or to elaborate a grammar on your own, it is likely you'll incurr with even more issues when using code generators because they hide a lot of details and add extra complexity.