ANTLR3 : MissingTokenException when using already defined token in input in place of ID

64 views Asked by At

I stumbled upon this problem while writing grammar rules for create table. My grammar is failing when column names are already defined tokens in the grammar (can't have column name 'create' matched with 'create' keyword!

Simple UseCase :

grammar hello;


start   :   
    'hello' 'world' ID 
    ;

ID : 'a'..'z'+ ;
WS : (' '|'\n'|'\r')+ {$channel=HIDDEN;} ;

For this grammar how do I make "Hello World Hello" as a valid input. Currently it is failing with MissingTokenException.

AST

                root
                 |   
                start
    __________________________________
    |              |                  |
  hello          World             MissingTokenException   

Thanks in advance.

EDIT:

I have found this inline-rule while definition rule for "hello" & "world", still to find how it works.

grammar hello;


stat: keyHELLO keyWORLD expr 
    ;

expr: ID
;

/** An ID whose text is "hello" */
keyHELLO : {input.LT(1).getText().equals("hello")}? ID ;

/** An ID whose text is "world" */
keyWORLD : {input.LT(1).getText().equals("world")}? ID ;
    // END:rules

ID : 'a'..'z'+ ;
WS : (' '|'\n'|'\r')+ {$channel=HIDDEN;} ;

AST

                root
                 |   
                start
    __________________________________
    |              |                  |
 keyHello        keyWorld             expr   
    |               |                  |
  hello           world             world

Hope it might help.

1

There are 1 answers

2
Juan Aguilar Guisado On

When you are parsing a language, they always have "reserved words". It's almost impossible to work without them. In your case, you have two options:

  1. Define a group of reserved words and make your ID an extension of it. I don't recommend you this possibility because this would be a terrible mess when you work with an entire grammar, or when you use your lexer-parser-tree, and you want to make something different with some tokens or reserved words (maybe skip them).

    RW: 'hello' | 'world';
    
    ID: 'a'..'z'+ | RW;
    

In this case you will be able to be sure when you are parsing a RW, but not an ID, because ID and RW go through the same rule...

  1. Choose other reserved words, better than 'hello' or ordinary words like it. For me, this is the best option.

Taking the case of the second option you can define one subgroup apart from these news ID's I tell you. You can also establish a difference between a selected group of words from your vocabulary ('world','hello' etc.) and deal with them different in your tree.

I hope this would help you!!