I am trying to learn Flex/Bison and the world if full of calculator examples where everything is an expression; I am trying a little more: to introduce the "var" keyword; so I got stuck. Here what I am trying to parse:
var x;
x = 3 + 4;
print x;
and later to complicate it with:
var x = 2+3, y = x+5, xyz;
xyz = x + y + 3;
print xyz;
Is "var x =" an expression like 2+3? Or ", y = " also an expression?
Edited - Added extra info:
I am at the very, very beginning:
%union
{
char *chars;
}
%token TOKEN_VAR
%token <chars> TOKEN_LITERAL
%token ';' TOKEN_SEMICOLON
%%
input
: varStatement {;}
;
varStatement
: TOKEN_VAR TOKEN_LITERAL TOKEN_SEMICOLON {AddStatement(new VarStatement($2));}
;
%%
Trying to parse: "var xz; var abc;" I have 2 problems:
- the $2 is always null
- the parser stops after var xz;
I don't think StackOverflow is the proper forum for a complete introduction to writing context-free grammars, but perhaps this is enough to get you started.
A grammar consists of a number of rules, each of which has the form "An X can be an a followed by a b followed by …". These rules can be recursive, and that is the only way to express concepts like arbitrary repetition. In other words, since we can't say "A list is any number of _expression_s separated by _Comma_s", what we say instead is: "A list can be an expression, or it can be a list followed by a Comma followed by an expression." We usually write that as follows:
The
|
is just an abbreviation. We could have written:Note the use of
','
to represent "a comma". In bison, you don't have to think up names for tokens which consist of a single character; you can simply use the character itself in quotes. (In flex, to return that token, you do the same thing:{ return ','; }
. That works because in C, a single-quoted character is an integer constant.)For multicharacter tokens -- keywords like
var
, for example -- bison lets you use a double-quoted string provided you have declared a token name. So for example, you could write:Now, a "program" is also a list of statements, but since the statements are terminated by semicolons, we don't need any punctuation in the list. That's a trivial difference:
There's a lot more to fill in here (
expression
, for example), but hopefully that gives you some ideas.