I'm having a hard time understanding what I'm supposed to do. The only thing I've figured out is I need to use yacc on the cminus.y file. I'm totally confused about everything after that. Can someone explain this to me differently so that I can understand what I need to do?
INTRODUCTION:
We will use lex/flex and yacc/Bison to generate an LALR parser. I will give you a file called cminus.y. This is a yacc format grammar file for a simple C-like language called C-minus, from the book Compiler Construction by Kenneth C. Louden. I think the grammar should be fairly obvious. The Yahoo group has links to several descriptions of how to use yacc. Now that you know flex it should be fairly easy to learn yacc. The only base type is int. An int is 4 bytes. Booleans are handled as ints, as in C. (Actually the grammar allows you to declare a variable as a type void, but let's not do that.) You can have one-dimensional arrays. There are no pointers, but references to array elements should be treated as pointers (as in C). The language provides for assignment, IF-ELSE, WHILE, and function calls and returns. We want our compiler to output MIPS assembly code, and then we will be able to run it on SPIM. For a simple compiler like this with no optimization, an IR should not be necessary. We can output assembly code directly in one pass. However, our first step is to generate a symbol table.
SYMBOL TABLE:
I like Dr. Barrett’s approach here, which uses a lot of pointers to handle objects of different types. In essence the elements of the symbol table are identifier, type and pointer to an attribute object. The structure of the attribute object will differ according to the type. We only have a small number of types to deal with. I suggest using a linear search to find symbols in the table, at least to start. You can change it to hashing later if you want better performance. (If you want to keep in C, you can do dynamic allocation of objects using malloc.) First you need to make a list of all the different types of symbols that there are—there are not many—and what attributes would be necessary for each. Be sure to allow for new attributes to be added, because we have not covered all the issues yet. Looking at the grammar, the question of parameter lists for functions is a place where some thought needs to be put into the design. I suggest more symbol table entries and pointers.
TESTING:
The grammar is correct, so taking the existing grammar as it is and generating a parser, the parser will accept a correct C-minus program but it won’t produce any output, because there are no code snippets associated with the rules. We want to add code snippets to build the symbol table and print information as it does so. When an identifier is declared, you should print the information being entered into the symbol table. If a previous declaration of the same symbol in the same scope is found, an error message should be printed. When an identifier is referenced, you should look it up in the table to make sure it is there. An error message should be printed if it has not been declared in the current scope. When closing a scope, warnings should be generated for unreferenced identifiers. Your test input should be a correctly formed C-minus program, but at this point nothing much will happen on most of the production rules.
SCOPING:
The most basic approach has a global scope and a scope for each function declared. The language allows declarations within any compound statement, i.e. scope nesting. Implementing this will require some kind of scope numbering or stacking scheme. (Stacking works best for a one-pass compiler, which is what we are building.)
(disclaimer) I don't have much experience with compiler classes (as in school courses on compilers) but here's what I understand:
1) You need to use the tools mentioned to create a parser which, when given input will tell the user if the input is a correct program as to the grammar defined in cminus.y. I've never used yacc/bison so I don't know how it is done, but this is what seems to be done:
2) It also seems that the output needs to check for variable consistency (ie, you can't use a variable you haven't declared same as any programming language), which is done via a symbol table. In short, every time something is declared you add it to the symbol table. When you encounter an identifier, if it is not one of the language identifiers (like
if
orwhile
orfor
), you'll look it up in the symbol table to determine if it has been declared. If it is there, go on. If it's not - print some-sort-of-errorNote: point(2) there is a simplified take on a symbol table; in reality there's more to them than I just wrote but that should get you started.
I'd start with yacc examples - see what yacc can do and how it does it. I guess there must be some big example-complete-with-symbol-table out there which you can read to understand further.
Example:
Let's take input A:
And input B:
and assume we're using C syntax for parsing. Your parser should deem Input A all right, but should yell "b is undeclared" for Input B.