ANTLR4 grammar conflicting rules

344 views Asked by At

I must start by saying that I'm new to grammars and I'm still learning my ways around antlr.

My grammar allows the following operations:

grammar TEST;

file :  (varDecl | functionDcl)+    ;
varDecl :   type ID ('=' expression)?   ';'     ;
type :  'int' | 'float' | 'void'    ;
functionDcl :   type ID '(' formalParameters? ')' block ;
formalParameters :  formalParameter (',' formalParameter)*  ;   
formalParameter :   type ID     ;
block : '{' stat* '}'   ;
stat :  block
     |  varDecl
     |  'if' expression 'then' stat ('else' stat)?
     |  'return' expression? ';'
     |  expression '=' expression ';'
     |  expression ';'
     ;
expression : unaryExprNotPlusMinus (intervalOp unaryExprNotPlusMinus)? ;
unaryExprNotPlusMinus :  unaryOp expression  
                      |  INT 
                      |  FloatingPointLiteral
                      ;


unaryOp : '~' | '!' | 'not' | 'typeof' | 'statictypeof';
intervalOp : '..' | '|..' | '..|' | '|..|'  ;

INT : JavaIDDigit+ ;
ID : Letter (Letter|JavaIDDigit)* ;

fragment
Letter
    :  '\u0024' |
       '\u0041'..'\u005a' |
       '\u005f' |
       '\u0061'..'\u007a' |
       '\u00c0'..'\u00d6' |
       '\u00d8'..'\u00f6' |
       '\u00f8'..'\u00ff' |
       '\u0100'..'\u1fff' |
       '\u3040'..'\u318f' |
       '\u3300'..'\u337f' |
       '\u3400'..'\u3d2d' |
       '\u4e00'..'\u9fff' |
       '\uf900'..'\ufaff'
    ;

fragment
JavaIDDigit
    :  '\u0030'..'\u0039' |
       '\u0660'..'\u0669' |
       '\u06f0'..'\u06f9' |
       '\u0966'..'\u096f' |
       '\u09e6'..'\u09ef' |
       '\u0a66'..'\u0a6f' |
       '\u0ae6'..'\u0aef' |
       '\u0b66'..'\u0b6f' |
       '\u0be7'..'\u0bef' |
       '\u0c66'..'\u0c6f' |
       '\u0ce6'..'\u0cef' |
       '\u0d66'..'\u0d6f' |
       '\u0e50'..'\u0e59' |
       '\u0ed0'..'\u0ed9' |
       '\u1040'..'\u1049'
   ;

FloatingPointLiteral
    :   ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
    |   '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
    |   ('0'..'9')+ Exponent FloatTypeSuffix?
    |   ('0'..'9')+ FloatTypeSuffix
    |   ('0x' | '0X') (HexDigit )*
        ('.' (HexDigit)*)?
        ( 'p' | 'P' )
        ( '+' | '-' )?
        ( '0' .. '9' )+
        FloatTypeSuffix?
    ;

fragment
Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
FloatTypeSuffix : ('f'|'F'|'d'|'D'|'bd'|'BD') ;

fragment
HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;

COMMENT
    :   '/*' .*? '*/'    -> channel(HIDDEN) 
    ;
WS  :   [ \r\t\u000C\n]+ -> channel(HIDDEN)
    ;

LINE_COMMENT
    : '//' ~[\r\n]* '\r'? '\n' -> channel(HIDDEN)
    ;

I grabbed a couple of things from other grammars I got to play with. My main issue is with my expr rule. Given the following input: int aaaa = 0..|9; my expectation was that the parse tree would find the ..| rule but instead it interprets the 0. as a float and doesn't parse the rest properly.

It works fine if I put a space after my 0 like this: int aaaa = 0 ..|9;

I need this to work without the space.

Any ideas?

Thanks!

1

There are 1 answers

1
GRosenberg On

The lexer tokenizes the character input before the parser executes. The parser only operates on tokens. Convert the unaryOp and intervalOp rules to lexer rules so they can participate in the tokenization process.

Update

@CoronA makes a valid observation. The suggested change is alone not sufficient. Moving the unaryOp and intervalOp rules to the lexer and changing the FloatingPointLiteral rule to

FloatingPointLiteral
    :   ('0'..'9')+ '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
    ...

(requiring at least one number after the decimal) is sufficient to properly match the input with and without the space. There are alternatives, but the OP needs to first clarify if a naked decimal is required to be allowed.