ANTLR -- use predicates to insert a token

215 views Asked by At

I am trying to understand ANTLR predicates. To that end, I have a simple lexer and parser, shown below.

What I would like to do is use a predicate to insert the word "fubar" every time it sees "foo" followed by some whitespace and then "bar". I want to do this while keeping the same basic structure. Bonus points for doing it in the lexer. Further bonus points if I can do it without referring to the underlying language at all. But if necessary, it is C#.

For example, if the input string is:

programmers use the words foo bar and bar foo class

the output would be

programmers use the words foo fubar bar and bar foo class

Lexer:

lexer grammar TextLexer;

@members
{
    protected const int EOF = Eof;
    protected const int HIDDEN = Hidden;
}

FOO: 'foo';
BAR: 'bar';
TEXT: [a-z]+ ;

WS
    :   ' ' -> channel(HIDDEN)
    ;

Parser:

parser grammar TextParser;

options { tokenVocab=TextLexer; }

@members
{
    protected const int EOF = Eof;
}

file: words EOF;

word:FOO
|BAR
|TEXT;

words: word
| word words
;

compileUnit
    :   EOF
    ;
1

There are 1 answers

0
Bart Kiers On

ANTLR3's lexer might have needed a predicate in this case, but ANTLR4's lexer is much "smarter". You can match "foo bar" in a single lexer rule and change its inner text with setText(...):

FOO_BAR
 : 'foo' [ \t]+ 'bar' {setText("fubar");}
 ;

TEXT
 : [a-z]+ 
 ;

WS
 : ' ' -> channel(HIDDEN)
 ;