ANTLR4 error recovery issues for class bodies

232 views Asked by At

I've found a strange issue regarding error recovery in ANTLR4. If I take the grammar example from the ANTLR book

grammar simple;

prog:   classDef+ ; // match one or more class definitions

classDef
    :   'class' ID '{' member+ '}' // a class has one or more members
    ;

member
    :   'int' ID ';'                       // field definition
    |   'int' f=ID '(' ID ')' '{' stat '}' // method definition
    ;

stat:   expr ';'
    |   ID '=' expr ';'
    ;

expr:   INT 
    |   ID '(' INT ')'
    ;

INT :   [0-9]+ ;
ID  :   [a-zA-Z]+ ;
WS  :   [ \t\r\n]+ -> skip ;

and use the input

class T {
    y;
    int x;
}

it will see the first member as an error (as it expects 'int' before 'y').

classDef
 | "class"
 | ID 'T'
 | "{"
 |- member
 |   | ID "y" -> error
 |   | ";" -> error
 |- member
 |   | "int"
 |   | ID "x"
 |   | ";"

In this case ANTLR4 recovers from the error in the first member subrule and parses the second member correct.

But if the member classDef is changed from mandatory member+ to optional member*

classDef
    :   'class' ID '{' member* '}' // a class has zero or more members
    ;

then the parsed tree will look like

classDef
 | "class" -> error
 | ID "T" -> error
 | "{" -> error
 | ID "y" -> error
 | ";" -> error
 | "int" -> error
 | ID "x" -> error
 | ";" -> error
 | "}" -> error

It seems that the error recovery cannot solve the issue inside the member subrule anymore.

Obviously using member+ is the way forward as it provides the correct error recovery result. But how do I allow empty class bodies? Am I missing something in the grammar?

The DefaultErrorStrategy class is quite complex with token deletions and insertions and the book explains the theory of this class in a very good way. But what I'm missing here is how to implement custom error recovery for specific rules?

In my case I would add something like "if { is already consumed, try to find int or }" to optimize the error recovery for this rule.

Is this possible with ANTLR4 error recovery in a reasonable way at all? Or do I have to implement manual parser by hand to really gain control over error recovery for those use cases?

1

There are 1 answers

0
Chris On

It is worth noting that the parser never enters the sub rule for the given input. The classDef rule fails before trying to match a member.

Before trying to parse the sub-rule, the sync method on DefaultErrorStrategy is called. This sync recognizes there is a problem and tries to recover by deleting a single token to see if that fixes things up.

In this case it doesn't, so an exception is thrown and then tokens are consumed until a 'class' token is found. This makes sense because that is what can follow a classDef and it is the classDef rule, not the member rule that is failing at this point.

It doesn't look simple to do correctly, but if you install a custom subclass of DefaultErrorStrategy and override the sync() method, you can get any recovery strategy you like.

Something like the following could be a starting point:

@Override
public void sync(Parser recognizer) throws RecognitionException {
  if (recognizer.getContext() instanceof simpleParser.ClassDefContext) {
    return;
  }

  super.sync(recognizer);
}

The result being that the sync doesn't fail, and the member rule is executed. Parsing the first member fails, and the default recovery method handles moving on to the next member in the class.