Parsing blocks of line comments using MGrammar

Question

Parsing blocks of line comments using MGrammar

201 views Asked by Magnus Lindhe At 08 June 2010 at 18:36

How can I parse blocks of line comments with MGrammar?

I want to parse blocks of line comments. Line comments that are next to each should grouped in the MGraph output.

I'm having trouble grouping blocks of line comments together. My current grammar uses "\r\n\r\n" to terminate a block but that will not work in all cases such as at end of file or when I introduce other syntaxes.

Sample input could look like this:

/// This is block
/// number one

/// This is block
/// number two

My current grammar looks like this:

module MyModule
{
    language MyLanguage
    {       
        syntax Main = CommentLineBlock*;

        token CommentContent = !(
                                 '\u000A' // New Line
                                 |'\u000D' // Carriage Return
                                 |'\u0085' // Next Line
                                 |'\u2028' // Line Separator
                                 |'\u2029' // Paragraph Separator
                                );   

        token CommentLine = "///" c:CommentContent* => c;
        syntax CommentLineBlock = (CommentLine)+ "\r\n\r\n";

        interleave Whitespace = " " | "\r" | "\n";   
    }
}

Original Q&A

There are 1 answers

**Lars Corneliussen** · Answer 1 · 2010-07-20T06:06:14+00:00

The Problem is, that you interleave all whitespaces - so after parsing the tokens and coming to the lexer, they just "don't exist" anymore.

CommentLineBlock is syntax in your case, but you need the comment-blocks to be completely consumed in tokens...

language MyLanguage
{       
    syntax Main = CommentLineBlock*;

    token LineBreak = '\u000D\u000A'
                         | '\u000A' // New Line
                         |'\u000D' // Carriage Return
                         |'\u0085' // Next Line
                         |'\u2028' // Line Separator
                         |'\u2029' // Paragraph Separator
                        ;  

    token CommentContent = !(
                             '\u000A' // New Line
                             |'\u000D' // Carriage Return
                             |'\u0085' // Next Line
                             |'\u2028' // Line Separator
                             |'\u2029' // Paragraph Separator
                            );   

    token CommentLine = "//" c:CommentContent*;
    token CommentLineBlock = c:(CommentLine LineBreak?)+ => Block {c};

    interleave Whitespace = " " | "\r" | "\n";   
}

But then the problem is, that the subtoken-rules in CommentLine won't be processed - you get plain strings parsed.

Main[
  [
    Block{
      "/// This is block\r\n/// number one\r\n"
    },
    Block{
      "/// This is block\r\n/// number two"
    }
  ]
]

I might try to find a nicer way tonight :-)

TechQA.

Parsing blocks of line comments using MGrammar

There are 1 answers

Related Questions in PARSING

Related Questions in GRAMMAR

Related Questions in MGRAMMAR

Popular Questions

Trending Questions