antlr grammar for triple quoted string

Question

antlr grammar for triple quoted string

514 views Asked by Brad Baker At 09 September 2017 at 18:06

I am trying to update an ANTLR grammar that follows the following spec

https://github.com/facebook/graphql/pull/327/files

In logical terms its defined as

StringValue ::
   - `"` StringCharacter* `"`
   - `"""` MultiLineStringCharacter* `"""`

StringCharacter ::
  - SourceCharacter but not `"` or \ or LineTerminator
  - \u EscapedUnicode
  - \ EscapedCharacter

MultiLineStringCharacter ::
  - SourceCharacter but not `"""` or `\"""`
  - `\"""`

(Not the above is logical - not ANTLR syntax)

I tried the follow in ANTRL 4 but it wont recognize more than 1 character inside a triple quoted string

string : triplequotedstring | StringValue ;

triplequotedstring: '"""' triplequotedstringpart?  '"""';

triplequotedstringpart : EscapedTripleQuote* | SourceCharacter*;

EscapedTripleQuote : '\\"""';

SourceCharacter :[\u0009\u000A\u000D\u0020-\uFFFF];

StringValue: '"' (~(["\\\n\r\u2028\u2029])|EscapedChar)* '"';

With these rules it will recognize '"""a"""' but as soon as I add more characters it fails

eg: '"""abc"""' wont parse and the IntelliJ plugin for ANTLR says

line 1:14 extraneous input 'abc' expecting {'"""', '\\"""', SourceCharacter}

How do I do triple quoted strings in ANTLR with '\"""' escaping?

Original Q&A

There are 2 answers

**Bart Kiers** · Answer 1 · 2017-09-10T11:27:47+00:00

Some of your parer rules should really be lexer rules. And SourceCharacter should probably be a fragment.

Also, instead of EscapedTripleQuote* | SourceCharacter*, you probably want ( EscapedTripleQuote | SourceCharacter )*. The first matches aaa... or bbb..., while you probably meant to match aababbba...

Try something like this instead:

string
 : Triplequotedstring 
 | StringValue 
 ;

Triplequotedstring
 : '"""' TriplequotedstringPart*? '"""'
 ;

StringValue
 : '"' ( ~["\\\n\r\u2028\u2029] | EscapedChar )* '"'
 ;

// Fragments never become a token of their own: they are only used inside other lexer rules
fragment TriplequotedstringPart : EscapedTripleQuote | SourceCharacter;
fragment EscapedTripleQuote : '\\"""';
fragment SourceCharacter :[\u0009\u000A\u000D\u0020-\uFFFF];

**Alex Zerntev** · Answer 2 · 2024-01-18T10:01:43+00:00

Triple quoted strings are often used to allow multi-line strings and unescaped characters inside a string. Assuming that you are skipping spaces and linebreaks, parsing triple quotes can be quite tricky, because there are some corner cases like:

Since triple quotes are multi-line, the parsing errors should be adapted to that. If you define triple quotes as part of the lexer, with line breaks, the line (and column) numbers will be wrong.
In case of """"""" (one double quote surrounded by triple quotes) the parsed result should be a string literal with content: "

In order to cope with the above issues a grammar with modes can be used is:

Lexer:

START_TRIPLE_QUOTE: '"""' -> pushMode(INSIDE_TRIPLE_QUOTE);

mode INSIDE_TRIPLE_QUOTE;
TRIPLE_QUOTED_STRING_CONTENT : '"' '"'? ~["]  // Match one or two quotes followed by a non-quote
                             | ~["]           // Match any character that is not a quote
                             ;
TRIPLE_QUOTE_END_2: '"""""' -> popMode;
TRIPLE_QUOTE_END_1: '""""' -> popMode;
TRIPLE_QUOTE_END_0: '"""' -> popMode;

Parser:

triple_string_literal: START_TRIPLE_QUOTE (TRIPLE_QUOTED_STRING_CONTENT)*
                              (TRIPLE_QUOTE_END_2
                              | TRIPLE_QUOTE_END_1
                              | TRIPLE_QUOTE_END_0);

And in your Listener/Visitor:

TripleQuotedStringConst(ctx.getText().substring(3, ctx.getText().length() - 3))

As a reference here is an article that I wrote: https://medium.com/@alexzerntev/parsing-multi-line-triple-quoted-strings-with-antlr4-ceca41cdeadb

TechQA.

antlr grammar for triple quoted string

There are 2 answers

Related Questions in JAVA

Related Questions in ANTLR

Related Questions in ANTLR4

Related Questions in GRAPHQL

Related Questions in GRAPHQL-JAVA

Popular Questions

Popular Tags

Trending Questions