lemon parser parsing 0 token

534 views Asked by At

I'm having a problem using (reentrant) Flex + Lemon for parsing. I'm using a simple grammar and lexer here. When I run it, I'll put in a number followed by an EOF token (Ctrl-D). The printout will read:

89

found int of .
AST=0.

Where the first line is the number I put in. Theoretically, the AST value should be the sum of everything I put in.

EDIT: when I call Parse() manually it runs correctly.

Also, lemon appears to run the atom ::= INT rule even when the token is 0 (the stop token). Why is this? I'm very confused about this behavior and I can't find any good documentation.

2

There are 2 answers

0
semisight On BEST ANSWER

Okay, I figured it out. The reason is that there is a particularly nasty (and poorly documented) interaction going on between flex and lemon.

In an attempt to save memory, lemon will hold onto a token without copying, and push it on to an internal token stack. However, flex also tries to save memory by changing the value that yyget_text points to as it lexes the input. The offending line in my example is:

// in the do loop of main.c...
Parse(parser, token, yyget_text(lexer));

This should be:

Parse(parser, token, strdup(yyget_text(lexer)));

which will ensure that the value that lemon points to when it reduces the token stack later is the same as what you originally passed in.

(Note: Don't forget, strdup means you'll have to free that memory at some point later. Lemon will let you write token "destructors" that can do this, or if you're building an AST tree you should wait until the end of the AST lifetime.)

0
Michael Bishop On

You might also try making a token type that contains a pointer to the string and the length of the string. I've had success with this.

token.h

#ifndef Token_h
#define Token_h

typedef struct Token {
    int code;
    char * string;
    int string_length;
} Token;

#endif // Token_h

main.c

int main(int argc, char** argv) {
    // Set up the scanner
    yyscan_t scanner;
    yylex_init(&scanner);
    yyset_in(stdin, scanner);

    // Set up the parser
    void* parser = ParseAlloc(malloc);
    
    // Do it!
    Token t;
    do {
        t.code = yylex(scanner);
        t.string = yyget_text(scanner);
        t.string_length = yyget_leng(scanner);
        Parse(parser, t.code, t);
    } while (t.code > 0);

    if (-1 == t.code) {
        fprintf(stderr, "The scanner encountered an error.\n");
    }

    // Cleanup the scanner and parser
    yylex_destroy(scanner);
    ParseFree(parser, free);
    return 0;
}

language.y (excerpt)

language.l (excerpt)

See my printf statement there? I'm using the string and the length to print out my token.