Why do I get the "mismatched input" error in ANTLR4?

31 views Asked by At

I am very new in ANTLR. I am developing a parser for the GURU language. I wrote a grammar and decided to check it on the website: site And I don't understand why I get errors.

Here is my grammar:

grammar Guru;

/*
    PARSER RULES
 */

expertSystem : definition initialization completion rules variables EOF;

// Defenition
definition : GOAL ':' expertiseVariable;

// Initialization
initialization : INITIAL ':' (output | assignment | input)+;

// Completion
completion : DO ':' (assignment | output)+;

// Rules
rules : (rule)+;

rule : RULE ':' ruleName
    (auxiliaryElement)* (ready)* 
    IF ':' premise 
    THEN ':' conclusion
    (reason)* (usedVariables)*;

ruleName : IDENTIFIER;

auxiliaryElement : priority | cost | test | comment;
priority : PRIORITY ':' RANGE;
cost : COST ':' RANGE;
test : TEST ':' testValue;
testValue : 'S' | 'E' | 'P';
comment : COMMENT ':' text;

ready : READY ':' (readyCommand)+;
readyCommand : output | assignment;

// TODO: LOGOPERATOR
premise : andExpression;
andExpression : orExpression ('AND' orExpression)*;
orExpression : atomicExpression ('OR' atomicExpression)*;
atomicExpression : '(' premise ')' | comparisonExpression;
comparisonExpression : comparisonOperand COMOPERATOR comparisonOperand;
comparisonOperand : expertiseVariable | value | (function '(' expertiseVariable ')');

conclusion : (assignment)+;

reason : REASON ':' text;

usedVariables : needs | changes;
needs : NEEDS ':' '{' expertiseVariable (',' expertiseVariable)* '}';
changes : CHANGES ':' '{' expertiseVariable (',' expertiseVariable)* '}';

// Variables
variables : (variable)+;
variable : VAR ':' expertiseVariable (variableCommand)*;

variableCommand : find | label | when | cfType | rigor | limit;

find : FIND ':' (findCommand)+;
findCommand : assignment | input;

label : LABEL ':' text;

when : WHEN ':' whenValue;
whenValue : 'F' | 'L' | 'N';

cfType : CFTYPE ':' cfTypeValue cfTypeValue;
cfTypeValue : 'M' | 'P';

rigor : RIGOR ':' rigorValue;
rigorValue : 'M' | 'C' | 'A';

limit : LIMIT ':' NUMBER;

// General rules
output : OUTPUT ':' text;

assignment : expertiseVariable '=' value;

input : INPUT ':' expertiseVariable TYPE ':' TYPES WITH ':' text;

expertiseVariable : IDENTIFIER;

function : IDENTIFIER;

value : STRING | NUMBER;

text: STRING;

/*
    LEXER RULES
 */

GOAL : 'GOAL';
INITIAL : 'INITIAL';
DO : 'DO';
RULE : 'RULE';
IF : 'IF';
THEN : 'THEN';
PRIORITY : 'PRIORITY';
COST: 'COST';
TEST: 'TEST';
COMMENT : 'COMMENT';
READY : 'READY';
REASON : 'REASON';
NEEDS : 'NEEDS';
CHANGES : 'CHANGES';
VAR : 'VAR';
FIND : 'FIND';
LABEL : 'LABEL';
WHEN : 'WHEN';
CFTYPE : 'CFTYPE';
RIGOR : 'RIGOR';
LIMIT : 'LIMIT';
OUTPUT : 'OUTPUT';
INPUT : 'INPUT';
TYPE : 'TYPE';
WITH : 'WITH';

IDENTIFIER : [a-zA-Z_][a-zA-Z0-9_]*;
STRING : '"' ~["]* '"';
RANGE : [1-9] [0-9]? | '100';
NUMBER : '0' | [1-9][0-9]*;
TYPES : 'NUM' | 'STRING' | 'REAL';
LOGOPERATOR : 'AND' | 'OR';
COMOPERATOR : '>' | '<' | '>=' | '<=' | '==';

WS : [ \t\r\n]+ -> skip;

Here is the text I wanted to check:

GOAL: RESH
INITIAL: 
    OUTPUT: "Some text"
COMPLETION:
    DO: OUTPUT: "Some text"
RULE: R1
IF: RESH < 20
THEN: RESH = 20
VAR: RESH

Here are the errors it produces:

1:4 token recognition error at: ':'
2:7 token recognition error at: ':'
3:10 token recognition error at: ':'
3:12 token recognition error at: '"'
3:22 token recognition error at: '"'
4:10 token recognition error at: ':'
5:6 token recognition error at: ':'
5:14 token recognition error at: ':'
5:16 token recognition error at: '"'
5:26 token recognition error at: '"'
6:4 token recognition error at: ':'
7:2 token recognition error at: ':'
7:9 token recognition error at: '<'
8:4 token recognition error at: ':'
9:3 token recognition error at: ':'
1:0 mismatched input 'GOAL' expecting 'GOAL'

What is the problem? I have already rewritten the grammar several times, but the result is unsuccessful

1

There are 1 answers

0
Bart Kiers On BEST ANSWER

Most of the error originate from the fact you did not clear the rules in the "lexer tab" in ANTLR lab. When you do that, many of the errors will disappear.

The problems that remain are then these:

RANGE : [1-9] [0-9]? | '100';
NUMBER : '0' | [1-9][0-9]*;

Given these 2 lexer rules, the input 20 will always become a RANGE token. That is simply how ANTLR produces tokens: it tries to match as many characters for every lexer rule, and when 2 (or more) lexer rule match the same characters, let the one defined first "win".

The solution: remove RANGE and replace all RANGEs in the parser rules with NUMBERs. Then after parsing, you can perform some semantic checks to see if NUMBER is valid in certain places or not. You can do this in an ANTLR listener.

The second problem is that the parser does not recognize the input:

COMPLETION:
    DO: OUTPUT: "Some text"

and given the grammar, I do not see what parser rule you are trying to match for this input. The parser rule completion:

completion : DO ':' (assignment | output)+;

seems to be missing the keyword COMPLETION and a : at the start. This could be a solution:

completion : COMPLETION ':' DO ':' (assignment | output)+;

...

COMPLETION : 'COMPLETION';