How to use the "$$" value returned by a PackCC parser?

186 views Asked by At

This is a minimal PackCC grammar example.

I try to retrieve and print the $$ value after parsing. The word is matched but only garbage is displayed by the printf call.

%value "char*"

word <- < [a-z]+[\n]* >  {$$ = $1;}

%%

int main(void)
{
    char* val = "Value";
    // Create a file to parse.
    FILE* f = freopen("text.txt", "w", stdin);

    if(f != NULL) {
        // Write the text to parse.
        fprintf(f, "example\n");
        // Set the file in read mode.
        f = freopen("text.txt", "r", stdin);
        pcc_context_t *ctx = pcc_create(NULL);
        // I expect val to receive the "$$" value from the parse.
        while(pcc_parse(ctx, &val));
        printf("val: %s\n",val);
        pcc_destroy(ctx);
        fclose(f);
    }
    else {
        puts("File is NULL");
    }
    return 0;
}

The PackCC doc says that $$ is:

The output variable, to which the result of the rule is stored.

And it says that the pcc_parse function:

Parses an input text (from standard input by default) and returns the result in ret. The ret can be NULL if no output data is needed. This function returns 0 if no text is left to be parsed, or a non-0 value otherwise.

1

There are 1 answers

0
rici On BEST ANSWER

There is no problem with your use of $$, in the sense that the char * value stored in $$ by the word action is faithfully returned into val.

The problem is that the char* value is a pointer to dynamically-allocated memory, and by the time the parser returns that dynamically-allocated memory has already been freed. So the pointer returned into val is a dangling pointer, and by the time printf is called, the memory region has been been used for some other object.

The documention for PackCC, such as it is, does not go into any detail about its memory management strategy, so it's not really clear how long the $1 pointer in a rule is valid. I think it would be safest to assume that it is only valid until the end of the last action in the rule. But it is certainly not reasonable to assume that the pointer will outlast a call to pcc_parse. After all, the parser has no way to know that you have stored the pointer outside of the parser context. The parser cannot rely on the programmer to free capture strings produced during rules; having to free every capture, even the ones never used, would be a sever inconvenience. To avoid memory leaks, the parser therefore must free its capture buffers.

The problem is easy to see if you are able to use valgrind or some similar tool. (Valgrind is available for most Linux distributions and for OS X since v10.9.x. Other platforms might be supported.) Running your parser under valgrind produced the following error report (truncated):

$ valgrind --leak-check=full ./test3
==2763== Memcheck, a memory error detector
==2763== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2763== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==2763== Command: ./test3
==2763== 
==2763== Invalid read of size 1
==2763==    at 0x4C34CF2: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2763==    by 0x4E9B5D2: vfprintf (vfprintf.c:1643)
==2763==    by 0x4F7017B: __printf_chk (printf_chk.c:35)
==2763==    by 0x10A32D: printf (stdio2.h:104)
==2763==    by 0x10A32D: main (test3.c:1013)
==2763==  Address 0x5232e20 is 0 bytes inside a block of size 9 free'd
==2763==    at 0x4C32D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2763==    by 0x109498: pcc_capture_table__term (test3.c:339)
==2763==    by 0x1096E3: pcc_thunk_chunk__destroy (test3.c:441)
==2763==    by 0x10974F: pcc_lr_answer__destroy (test3.c:557)
==2763==    by 0x109818: pcc_lr_memo_map__term (test3.c:602)
==2763==    by 0x10985F: pcc_lr_table_entry__destroy (test3.c:619)
==2763==    by 0x109BB8: pcc_lr_table__shift (test3.c:680)
==2763==    by 0x109C1C: pcc_commit_buffer (test3.c:757)
==2763==    by 0x10A22C: pcc_parse (test3.c:986)
==2763==    by 0x10A314: main (test3.c:1011)
==2763==  Block was alloc'd at
==2763==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2763==    by 0x108C9D: pcc_malloc_e (test3.c:225)
==2763==    by 0x108FF3: pcc_strndup_e (test3.c:252)
==2763==    by 0x109038: pcc_get_capture_string (test3.c:764)
==2763==    by 0x10904E: pcc_action_word_0 (test3.c:892)
==2763==    by 0x108C56: pcc_do_action (test3.c:872)
==2763==    by 0x108C87: pcc_do_action (test3.c:875)
==2763==    by 0x10A224: pcc_parse (test3.c:983)
==2763==    by 0x10A314: main (test3.c:1011)

That's a lot to go through, but it shows that there was an attempt to use the first byte of a 9-byte dynamically-allocated memory region which has already been free'd. ("Address 0x5232e20 is 0 bytes inside a block of size 9 free'd".) Furthermore, the backtrace shows that the error was triggered by a call to strlen, which had been called by printf; printf was called from your main function. (Unfortunately, PackCC does not issue #line directives, making it impossible to correlate the line numbers in the generated C parser with the line numbers in the original PEG grammar file. However, in this case it's clear where the printf is, since there's really only one possibility inside the main function.) Valgrind also shows you where the memory was dynamically allocated; although you'd have to have a copy of the generated parser handy to see how all the parts fit together, the names of the functions in the call trace are somewhat helpful.

The solution is basically the same as the way you must handle yytext in a parser which relies on (f)lex-based scanners: since the string pointed to by the action is in memory which whose lifetime is about to end, any token which you want to use later must be copied. The simplest way to do that is to use strdup (or equivalent, if you're not able to use standard Posix interfaces), changing the action to:

word <- < [a-z]+[\n]* >  {$$ = strdup($1);}

Once you do this, the "word" example will be printed as expected (including the newline character which terminates it).

You also must remember to free the copies you have made in order to avoid leaking memory. Valgrind will also help you detect memory leaks, so it can help you catch errors resulting from forgetting to do so.