Antlr - mismatched input '1' expecting number

1.3k views Asked by At

I'm new to Antlr and I have the following simplified language:

grammar Hello;

sentence : targetAttributeName EQUALS expression+ (IF relationedExpression (logicalRelation relationedExpression)*)?;         

expression : 
    '(' expression ')' |
    expression ('*'|'/') expression |
    expression ('+'|'-') expression |   
    function |
    targetAttributeName |
    NUMBER;

filterExpression :
    '(' filterExpression ')' |
    filterExpression ('*'|'/') filterExpression |
    filterExpression ('+'|'-') filterExpression |   
    function |
    filterAttributeName |
    NUMBER |
    DATE;

relationedExpression :
    filterExpression ('<'|'<='|'>'|'>='|'=') filterExpression |
    filterAttributeName '=' STRING |
    STRING '=' filterAttributeName
    ;

logicalRelation :
    'AND' |
    'OR'
    ;

targetAttributeName :
    'x'|
    'y'
;

filterAttributeName :
    'a' |
    'a' '1' |
    targetAttributeName;

function:
    simpleFunction |
    complexFunction ;

simpleFunction : 
    'simpleFunction' '(' expression ')' |
    'simpleFunction2' '(' expression ')'
    ;

complexFunction : 
    'complexFunction' '(' expression ')' |
    'complexFunction2' '(' expression ')'
;

EQUALS : '=';
IF : 'IF';

STRING : '"' [a-zA-z0-9]* '"';
NUMBER : [-]?[0-9]+('.'[0-9]+)?;
DATE: NUMBER NUMBER NUMBER NUMBER '.' NUMBER NUMBER? '.' NUMBER NUMBER? '.';
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

It works with x = y * 2, but it doesn't work with x =y * 1.

The error message is the following:

Hello::sentence:1:7: mismatched input '1' expecting {'simpleFunction', 'complexFunction', 'x', 'y', 'complexFunction2', '(', 'simpleFunction2', NUMBER}

It is very strange for me, because 1 is a NUMBER...

If I change the filterAttribute from 'a' '1' to 'a1', then it works with x=y*1, but I don't understand the difference between the two cases. Could somebody explain it for me?

Thanks.

1

There are 1 answers

1
Bart Kiers On BEST ANSWER

By doing this:

filterAttributeName :
    'a' |
    'a' '1' |
    targetAttributeName;

ANTLR creates lexer rules from these inline tokens. So you really have a lexer grammar that looks like this:

T_1 : '1': // the rule name will probably be different though
T_a : 'a';
...
NUMBER : [-]?[0-9]+('.'[0-9]+)?;

In other words, the input 1 will be tokenized as T_1, not as a NUMBER.

EDIT

Whenever certain input can match two or more lexer rules, ANTLR chooses the one defined first. The lexer does not "listen" to the parser to see what it needs at a particular time. The lexing and parsing are 2 distinct phases. This is simply how ANTLR works, and many other other parser generators. If this is not acceptable for you, you should google for "scanner-less parsing", or "packrat parsers".