Java CUP and JFlex Interaction

2.4k views Asked by At

I am considering to use the CUP parser generator for a project. In order to correctly parse some constructs of the language I am going to be compiling, I will need the lexer (generated by JFlex) to use information from the symbol table (not parse table -- I mean the table in which I will be storing information about identifiers) of the parser to generate the correct token type when its next_token() method is invoked. Since information in the symbol table depends statically on the program text, this will only work if the next_token() method is invoked "in lockstep" with the parser. In other words, this will work if the parser calls the lexer whenever it needs another token, but not if (for example) there is a parellel thread that is invoking the lexer and buffering tokens in a queue.

The question is thus: How does CUP call the lexer? Does it call it whenever it needs the next token? I could of course just write a CUP grammar specification and inspect the generated parser's source file to see what's going on, but that may be more work than necessary. I couldn't find any information on this on relevant websites.

Thanks a lot for any help you can offer!

3

There are 3 answers

1
capagira87 On

Maybe this reply could be too late for you, but it could be useful for other users. The first thing to know is that a Parser couldn't do anything without a Scanner. As a matter of fact, the first parameter of the constructor of the parser is the scanner. After the compilation of the .cup file, you will have, as output, a .java file that has the same name of the .cup one. Let's suppose its name is Parser. So in the main class of your project you have to add the following lines:

TmpParser p = new TmpParser (new Scanner (new Reader (s)));   
        p.parse();

You should post this code into a try-catch block. With the method parse, the Parser starts its action and also it calls the next_token method of the Scanner, in order to recognize the token and verify if the grammar rules you wrote are right or not.

0
user2625942 On

I finished implementing my parser and scanner a while ago. Here's what I found:

CUP does indeed invoke the scanner as and when needed. It has always buffered one more token ahead of what has been recognized so far (the lookahead token). There is no fancy buffering of tokens ahead of time.

That being said, it can be tricky to set lexer states during parsing, as this can give rise to many grammar conflicts. I guess this is to do with the way CUP represents semantic actions embedded within productions. This forced me to abandon my initial design nonetheless, but not for the reason I was dreading.

Hope this helps someone!

0
adityagerrard On

I don't know how late I'm to answer this question, But I'm building 1 parser as a part of my course work.. I'm Using Lex and CUP for lexer and Parser, respectively. I'm also including my main class which calls parser which scans as in when required on get Token call So My driver class will be :

// construct the lexer, 
Yylex lexer = new Yylex(new FileReader(filename));
// create the parser
Parser parser = new Parser(lexer);
// and parse

Parser intern calls:

Parser.parse() {
    ...
    this.cur_token = this.scan();
    ...
}

public Symbol scan() throws Exception {
    Symbol sym = this.getScanner().next_token();
    return sym != null ? sym : this.getSymbolFactory().newSymbol("END_OF_FILE", this.EOF_sym());
}

parser.parse();