Is it possible to parse multi-line c style comments in MGrammar?

203 views Asked by At

I've been hacking around with the May09 Oslo bits, experimented with tokenizing some source code. I can't seem to figure out how to correctly handle multiline C-style comments though. For example: /*comment*/

Some cases that elude me:

/***/

or

/**//**/

I can make one or the other work, but not both. The grammar was:

    module Test {
    language Comments {

        token Comment =
            MultiLineComment;

        token MultiLineComment =
            "/*" MultiLineCommentChar* "*/";

        token MultiLineCommentChar =
            ^ "*" |
            "*" PostAsteriskChar;

        token PostAsteriskChar =
            ^ "*" |
            "*" ^("*" | "/"); 

        /*    
        token PostAsteriskChar =
            ^ "*" |
            "*" PostAsteriskChar; 
        */

        syntax Main = Comment*;
    }
}

The commented out token is what I think I want to do, however recursive tokens are not permitted. The fact that MGrammar itself has "broken" multiline comments (it can't handle /***/) leads me to believe this isn't possible.

Does anyone know otherwise?

1

There are 1 answers

1
Sam On

The way I have done it is as follows (not all my own code but I can't find a referance to the original author).

interleave Skippable = Whitespace | Comment;
interleave Comment = CommentToken;
@{Classification["Comment"]}
token CommentToken = CommentDelimited
| CommentLine;
token CommentDelimited = "/*" CommentDelimitedContent* "*/";
token CommentDelimitedContent
= ^('*')
| '*'  ^('/');
token CommentLine = "//" CommentLineContent*;
token CommentLineContent
= ^(
'\u000A' // New Line
|  '\u000D' // Carriage Return
|  '\u0085' // Next Line
|  '\u2028' // Line Separator
|  '\u2029' // Paragraph Separator
);

This allows for both single line (//) comments as well as multiline (/* */) comments.