How to add back comments/whitespaces in translator using the Antlr4's visitor model

186 views Asked by At

I'm currently writing a TSQL (Sybase/Microsoft SQL) to MySQL translator using the ANTLR4 visitor approach.

I'm able to push comments and whitespaces to different channels so that I can use that information later.

What's not super clear is:

  1. how do I get the data back?
  2. and more importantly how do I plug the comments and whitespaces back into my translated MySQL code?

Re: #1, this seems to work to get the list of all tokens including the comments/whitespaces:

public static List<Token> getHiddenTokensFromString(String sqlIn, int hiddenChannel) {
    CharStream charStream = CharStreams.fromString(sqlIn);
    CaseChangingCharStream upper = new CaseChangingCharStream(charStream, true);
    TSqlLexer lexer = new TSqlLexer(upper);
    CommonTokenStream commonTokenStream = new CommonTokenStream(lexer, hiddenChannel);
    commonTokenStream.fill();
    List<Token> hiddenTokens = commonTokenStream.getTokens();
    return hiddenTokens;
}

Re #2, what makes it particularly challenging is that as part of the translation, lines of SQL have to be moved around, some lines removed and some lines added.

Any help will be greatly appreciated.

Thanks.

1

There are 1 answers

4
Mike Lischke On

The ANTLR4 lexer creates a number of tokens, each with an index (a running number). Provided you didn't just skip a token, all tokens are available for later inspection, once the parsing step is done, regardless of their channels (the channel is actually just a number property on a token).

So, given you have a token you want to translate, get its index and then ask the token stream for the tokens with the next smaller index or next higher index. These are usually the hidden whitespaces.

Once you have the whitespace token use its start and stop index to get the original text from the char stream. And since you know where you are in the translation process when you do that, it should be easy to know where to insert the original text.