This is a minimal PackCC grammar example.
I try to retrieve and print the $$
value after parsing. The word is matched but only garbage is displayed by the printf
call.
%value "char*"
word <- < [a-z]+[\n]* > {$$ = $1;}
%%
int main(void)
{
char* val = "Value";
// Create a file to parse.
FILE* f = freopen("text.txt", "w", stdin);
if(f != NULL) {
// Write the text to parse.
fprintf(f, "example\n");
// Set the file in read mode.
f = freopen("text.txt", "r", stdin);
pcc_context_t *ctx = pcc_create(NULL);
// I expect val to receive the "$$" value from the parse.
while(pcc_parse(ctx, &val));
printf("val: %s\n",val);
pcc_destroy(ctx);
fclose(f);
}
else {
puts("File is NULL");
}
return 0;
}
The PackCC doc says that $$
is:
The output variable, to which the result of the rule is stored.
And it says that the pcc_parse
function:
Parses an input text (from standard input by default) and returns the result in
ret
. Theret
can beNULL
if no output data is needed. This function returns 0 if no text is left to be parsed, or a non-0 value otherwise.
There is no problem with your use of
$$
, in the sense that thechar *
value stored in$$
by theword
action is faithfully returned intoval
.The problem is that the
char*
value is a pointer to dynamically-allocated memory, and by the time the parser returns that dynamically-allocated memory has already been freed. So the pointer returned intoval
is a dangling pointer, and by the timeprintf
is called, the memory region has been been used for some other object.The documention for PackCC, such as it is, does not go into any detail about its memory management strategy, so it's not really clear how long the
$1
pointer in a rule is valid. I think it would be safest to assume that it is only valid until the end of the last action in the rule. But it is certainly not reasonable to assume that the pointer will outlast a call topcc_parse
. After all, the parser has no way to know that you have stored the pointer outside of the parser context. The parser cannot rely on the programmer tofree
capture strings produced during rules; having tofree
every capture, even the ones never used, would be a sever inconvenience. To avoid memory leaks, the parser therefore mustfree
its capture buffers.The problem is easy to see if you are able to use valgrind or some similar tool. (Valgrind is available for most Linux distributions and for OS X since v10.9.x. Other platforms might be supported.) Running your parser under valgrind produced the following error report (truncated):
That's a lot to go through, but it shows that there was an attempt to use the first byte of a 9-byte dynamically-allocated memory region which has already been free'd. ("Address 0x5232e20 is 0 bytes inside a block of size 9 free'd".) Furthermore, the backtrace shows that the error was triggered by a call to
strlen
, which had been called byprintf
;printf
was called from yourmain
function. (Unfortunately, PackCC does not issue#line
directives, making it impossible to correlate the line numbers in the generated C parser with the line numbers in the original PEG grammar file. However, in this case it's clear where theprintf
is, since there's really only one possibility inside themain
function.) Valgrind also shows you where the memory was dynamically allocated; although you'd have to have a copy of the generated parser handy to see how all the parts fit together, the names of the functions in the call trace are somewhat helpful.The solution is basically the same as the way you must handle
yytext
in a parser which relies on (f)lex-based scanners: since the string pointed to by the action is in memory which whose lifetime is about to end, any token which you want to use later must be copied. The simplest way to do that is to usestrdup
(or equivalent, if you're not able to use standard Posix interfaces), changing the action to:Once you do this, the "word"
example
will be printed as expected (including the newline character which terminates it).You also must remember to
free
the copies you have made in order to avoid leaking memory. Valgrind will also help you detect memory leaks, so it can help you catch errors resulting from forgetting to do so.