Writing a parser with M, consume while not rule

Question

Writing a parser with M, consume while not rule

97 views Asked by John Leidegren At 25 November 2009 at 07:06

I'm writing a HTML parser for my own amusement and I wanted to try out M.

I base this work on the HTML 4.01 standard and in there it says

Although the STYLE and SCRIPT elements use CDATA for their data model, for these elements, CDATA must be handled differently by user agents. Markup and entities must be treated as raw text and passed to the application as is. The first occurrence of the character sequence "</" (end-tag open delimiter) is treated as terminating the end of the element's content. In valid documents, this would be the end tag for the element.

I think about it for a while and really what I wanna do is something like this

syntax Main 
    = "<script>" Script "</script>"
    ;
token Script
    = TakeWhileNot("</") // this is not valid M grammar
    ;

I find my self finding that I want to perform some kind of tokenization rule that matches until I reach an open angle bracket < followed by a forward slash /.

If the escape sequence was a single character this would not be a problem because then I could have written this.

token Script
    = ScriptEscape+
    ;
token ScriptEscape
    = !"<"
    ;

And that would work, not sure if I'm going about this the right way but the problem is sort of related to that I have a language embedded in another but I don't care about the script language in this case so I simply want to skip a head.

Original Q&A

There are 1 answers

**John Leidegren** · Accepted Answer · 2009-11-26T20:48:52+00:00

I figured out this neat trick, which wasn't entirely obvious...

syntax Main 
    = "<script>" Script* "</script>"
    ;
token Script
    = !('<')
    | '<' !('/')
    ;

Now that's valid MGrammar, which translates into:

Do NOT take '<' OR take '<' NOT followed by '/'

Which would consume anything until a </ token is encountered without consuming it.

TechQA.

Writing a parser with M, consume while not rule

There are 1 answers

Related Questions in OSLO

Related Questions in MGRAMMAR

Popular Questions

Trending Questions