Why the parser actions, preceding an error, aren't performed?

106 views Asked by At

This Packcc grammar is just a list of string litterals. When a syntax error is found actions are skipped even if the error is in the last term of the file. It is a problem because I want to compute the line/column location during the spaces parsing.

%source {
#include <stdio.h>
#include <stdlib.h>
}

line <- 
    list _ EOL

list <- 
    list _ ',' _ string 
    / _ string

string <- 
    '"'  ~{puts("unopened string");} (!'"' !EOL .)* '"'   
     # The error action here is not skipped

_ <- 
    [ \t]* {puts("_ match");} 
    # This action doesn't happens if there is an error. In my real project the cursor location is computed here.

EOL <- 
      ('\n'
      / '\r\n' 
      / '\r' ) 

%%

int main(void)
{
  pcc_context_t* ctx = pcc_create(NULL);
  while(pcc_parse(ctx, NULL));
  pcc_destroy(ctx);
  return 0;
}

The test file "test.txt" contains just "a", "b", c"

Create the parser using packcc theAboveFile.peg Then compile the generated .c file, then run it using pipe to the test file like this ./a.out < test.txt

As you can see there is an error in the last string but I can't do actions to compute the line/column locations because the actions are skipped for some reason.

1

There are 1 answers

1
rici On BEST ANSWER

That shouldn't be puzzling since it's exactly what the documentation says will happen:

Curly braces surround an action. The action is arbitrary C source code to be executed at the end of matching.… Actions are not executed where matching fails.

Why it works like that is probably not in scope here, and the author is probably the best source, but I'd guess that it has something to do with the implementation of backtracking. Unless actions have no side effects, you don't want to run one until you are sure that it's not going to need to be undone. On the other hand, the peg/leg parser which apparently somehow contributed to the design of PackCC does include "predicate" actions, which always run immediately (although I think they are expected to not have side effects), so there is precedent. Perhaps you could file a feature request. Or just use peg/leg instead :-) (I don't know anything about it, either. So don't take that as a recommendation.)

I guess the question you really want to ask is "how can I work around this limitation?" Although I'm certainly not a PackCC expert, I did read the documentation about error actions, which you already use. It seems to me that since . matches anything, !. should fail anywhere except at the end of input, so attaching an error action to it should cause the action to always run (how many times, I don't know). I tried replacing your _ rule with the following:

_ <-
    !.~{puts("_ ran");} / [ \t]* {puts("_ match");}

and indeed, the action does seem to run:

$ # Correct input. Note that the "_ ran" predicate runs six times
$ # before the first execution of the "_ match" predicated, consistent
$ # with the error action running immediately while the match action
$ # is deferred.
$ ./test4<<<'"a", "b", "c"'
_ ran
_ ran
_ ran
_ ran
_ ran
_ ran
_ match
_ match
_ match
_ match
_ match
_ match


$ # Invalid input. Error action runs five times, presumably because
$ # it doesn't run after the syntax error is signalled.
$ ./test4<<<'"a", "b", c"'
_ ran
_ ran
_ ran
_ ran
_ ran
unopened string
Syntax error

That's not a perfect solution, by any means; the fact that the !. pattern will match at end-of-input (which I think can only happen if the end-of-input is not preceded by a newline) might have repercussions. But it might be enough to get you started.