I'm new to Antlr and I have the following simplified language:
grammar Hello;
sentence : targetAttributeName EQUALS expression+ (IF relationedExpression (logicalRelation relationedExpression)*)?;
expression :
'(' expression ')' |
expression ('*'|'/') expression |
expression ('+'|'-') expression |
function |
targetAttributeName |
NUMBER;
filterExpression :
'(' filterExpression ')' |
filterExpression ('*'|'/') filterExpression |
filterExpression ('+'|'-') filterExpression |
function |
filterAttributeName |
NUMBER |
DATE;
relationedExpression :
filterExpression ('<'|'<='|'>'|'>='|'=') filterExpression |
filterAttributeName '=' STRING |
STRING '=' filterAttributeName
;
logicalRelation :
'AND' |
'OR'
;
targetAttributeName :
'x'|
'y'
;
filterAttributeName :
'a' |
'a' '1' |
targetAttributeName;
function:
simpleFunction |
complexFunction ;
simpleFunction :
'simpleFunction' '(' expression ')' |
'simpleFunction2' '(' expression ')'
;
complexFunction :
'complexFunction' '(' expression ')' |
'complexFunction2' '(' expression ')'
;
EQUALS : '=';
IF : 'IF';
STRING : '"' [a-zA-z0-9]* '"';
NUMBER : [-]?[0-9]+('.'[0-9]+)?;
DATE: NUMBER NUMBER NUMBER NUMBER '.' NUMBER NUMBER? '.' NUMBER NUMBER? '.';
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
It works with x = y * 2
, but it doesn't work with x =y * 1
.
The error message is the following:
Hello::sentence:1:7: mismatched input '1' expecting {'simpleFunction', 'complexFunction', 'x', 'y', 'complexFunction2', '(', 'simpleFunction2', NUMBER}
It is very strange for me, because 1
is a NUMBER
...
If I change the filterAttribute
from 'a' '1'
to 'a1'
, then it works with x=y*1
, but I don't understand the difference between the two cases. Could somebody explain it for me?
Thanks.
By doing this:
ANTLR creates lexer rules from these inline tokens. So you really have a lexer grammar that looks like this:
In other words, the input
1
will be tokenized asT_1
, not as aNUMBER
.EDIT
Whenever certain input can match two or more lexer rules, ANTLR chooses the one defined first. The lexer does not "listen" to the parser to see what it needs at a particular time. The lexing and parsing are 2 distinct phases. This is simply how ANTLR works, and many other other parser generators. If this is not acceptable for you, you should google for "scanner-less parsing", or "packrat parsers".