Not able to parse continuos string using antlr (without spaces)

348 views Asked by At

I have to parse the following query using antlr

sys_nameLIKEvalue

Here sys_name is a variable which has lower case and underscores. LIKE is a fixed key word.

value is a variable which can contain lower case uppercase as well as number.

Below the grammer rule i am using

**expression : parameter 'LIKE' values EOF;
parameter : (ID);  
ID : (LOWERCASE) (LOWERCASE | UNDERSCORE)* ; 
values : (VALUE);
VALUE :  (LOWERCASE | NUMBER | UPPERCASE)+ ;
LOWERCASE : 'a'..'z' ;
UPPERCASE : 'A'..'Z' ;
NUMBER : '0'..'9' ;
UNDERSCORE : '_' ;**

Test Case 1

Input : sys_nameLIKEabc

error thrown : line 1:8 missing 'LIKE' at 'LIKEabc'

Test Case 2

Input : sysnameLIKEabc

error thrown : line 1:0 mismatched input 'sysnameLIKEabc' expecting ID
1

There are 1 answers

1
Bart Kiers On

A literal token inside your parser rule will be translated into a plain lexer rule. So, your grammar really looks like this:

expression : parameter LIKE values EOF;
parameter  : ID;  
values     : VALUE;

LIKE       : 'LIKE';
ID         : LOWERCASE (LOWERCASE | UNDERSCORE)* ; 
VALUE      : (LOWERCASE | NUMBER | UPPERCASE)+ ;

// Fragment rules will never become tokens of their own: good practice!
fragment LOWERCASE  : 'a'..'z' ;
fragment UPPERCASE  : 'A'..'Z' ;
fragment NUMBER     : '0'..'9' ;
fragment UNDERSCORE : '_' ;

Since lexer rules are greedy, and if two or more lexer rules match the same amount of character the first will "win", your input is tokenized as follows:

Input: sys_nameLIKEabc, 2 tokens:

  • sys_name: ID
  • LIKEabc: VALUE

Input: sysnameLIKEabc, 1 token:

  • sys_nameLIKEabc: VALUE

So, the token LIKE will never be created with your test input, so none of your parser rule will ever match. It also seems a bit odd to parse input without any delimiters, like spaces.

To fix your issue, you will either have to introduce delimiters, or disallow your VALUE to contain uppercases.