any antlr4 equivalent to the 'not' scala parser combinator?

134 views Asked by At

I'm porting a grammar from scala combinators to antlr4, and the original grammar uses the 'not(p: Parser) ' parser combinator, which succeeds when the enclosed parser fails.

In the parser I am porting, I used the 'not' combinator to tell apart special comments starting with

'/*!' 

from standard comments which start by

'/*'

while allowing standard comments (either multiline or end-of-line) within special comments, and also allowing comments nested in comments:

Below is the original scala code:

/* Annotation blocks with user defined contents. */
lazy val specialComment: PackratParser[Any] = specialCommentBegin ~> rep(  not( multilineCommentEnd ) ~ ( comment | specialCommentContents )  ) ~ multilineCommentEnd

/* The whitespace parser, swallows both true whitespace and non-special comments. */
lazy val whitespaceParser: PackratParser[Any] = rep( whiteSpace | comment )

/* Multiline comment start delimiter. */
lazy val multilineCommentStart: PackratParser[Any] = not(  specialCommentBegin  ) ~ multilineCommentBegin

/* Nested multiline comments. */
lazy val multilineComment: PackratParser[Any] =  multilineCommentStart ~ rep(  not( multilineCommentEnd ) ~ ( comment | any )  ) ~ multilineCommentEnd

/* End of line comments. */
lazy val endOfLineComment: PackratParser[Any] = endOfLineCommentBegin ~ rep ( anyButEOL ) ~ "\n"

/* Matches everything except end of line. */
lazy val anyButEOL: PackratParser[Any] = not ( "\n" ) ~ any

/* Any comment. */
lazy val comment = multilineComment | endOfLineComment

Is there any equivalent to 'not' (either built-in symbol or design pattern) that would allow to solve the problem of parsing things like:

/*  /*! this is an interpreted special comment */ that gets discarded because commented out */

or

/*! this is an interpreted special comment /* containing a comment */ */

or

   /*! a special comment // with end-of-line comments 
    * which spans several lines // and again
    * /*  and again
         over several lines
      */
    */

Thanks for your help!

2

There are 2 answers

1
Bart Kiers On BEST ANSWER

ANTLR's lexer rules can also call themselves recursively. So you could make one big token from these special comments like this:

SPECIAL_COMMENT
 : '/*!' ( SPECIAL_COMMENT | SL_COMMENT | ML_COMMENT | . )*? '*/'
 ;

fragment SL_COMMENT
 : '//' ~[\r\n]*
 ;

fragment ML_COMMENT
 : '/*' .*? '*/'
 ;
0
remi On

Here is my final version, it accepts comments nested inside comments, comments inside special comments, but rejects special comments in special comments. I used the non-eager closure '*?' to make it work as intended:

SPEC: (EOL_SPEC | C_SPEC ) -> channel(SPEC_CHANNEL) ;
fragment C_SPEC: '/*!' ( COMMENT | . )*? '*/' ;
fragment EOL_SPEC: '//@' .*? EOL;

COMMENT: ( EOL_COMMENT | C_COMMENT ) -> channel(COMMENT_CHANNEL);
fragment C_COMMENT: '/*' ( COMMENT |. )*? '*/' ;
fragment EOL_COMMENT: '//' .*? EOL;