Scintilla.NET regular expression based syntax highlighing

857 views Asked by At

Is it possible to use regular expressions to define syntax highlighting in Scintilla? And if so, how to do it?

I have a custom language to process, which cannot be described in simple terms of keywords and delimiters. The meaning of particular structures in this language is dependent only on their position relative to keywords. I have regular expression based parser for this format, all I need is to apply regular expression defined rules as text styles.

I mean if something matches regex1, it should have style1. Is it possible? How?

If not - can I set styles for manually selected ranges? I mean to assign style number to a specified character range in editor. How to do it?

Is it possible to define Scintilla styles in code, not in xml file?

EDIT: OK, I've found a way.

foreach (Match m in Patterns.Keyword0.Matches(Encoding.ASCII.GetString(e.RawText)))
                        e.GetRange(m.Index, m.Index + m.Length).SetStyle(1);

The problem is RawText property. It's byte buffer of UTF-8 encoded text. The text property contains nice UTF-16 text, but the GetRage method accepts byte offset not character offset. If I use conversion on each TextChanged event I loose almost all speed advantage from using Scintilla.

Of course the easiest way would be to change internal encoding to UTF-16, but when I do it, I get exception saying this encoding is not supported. The only one supported seems to be UTF-8 which is ridiculously hard (and slow) to process.

I'm hitting a wall here.

1

There are 1 answers

1
ekhumoro On

The key to this is to set the lexer to SCLEX_CONTAINER and then handle the SCN_STYLENEEDED notification. This means you only ever have to process the text that actually needs styling.

There are several guides linked at the top of the Scintilla Documentation that detail various aspects of implementing customs lexers, so I won't bother repeating any of that here.

As for performance: I've written custom scintilla lexers is python that decode to utf-8 when styling and have never noticed any significant issues, so I'd be amazed if you couldn't at least match that using C#.