Creating a syntax highlighter in python(PyQt4)

4.3k views Asked by At

I've been searching the internet about the Syntax highlighting of a particular file in a text editor and i read about Lexers and Yacc. i'm quite confuse about the concepts on syntax highlighting.

I've created a simple text editor using PyQt4 and i want it to enable syntax highlighting of programming languages such as HTML,CSS,Python,C/C++. But i've no clue on how to start implementing this and where to start. Please someone point me to the right direction and pliz clear my doubts on syntax highlighting. please.

3

There are 3 answers

1
Peter Westlake On BEST ANSWER

You need to divide the text into lexical tokens (words, numbers, symbols, and so on), find out what each one is, and colour it accordingly. It's easy enough to recognize numbers and symbols, but to know whether a word is a variable, a function, a keyword or whatever means parsing the text according to the syntactical rules of the language. That's why your search finds references to lexical analysis (Lex) and parsing (Yacc). Lexical analysis is about assembling letters and symbols into words and other tokens, and parsing is about how those tokens go together to make up a syntactically valid program.

Python has a library module, tokenize, that does exactly what you need for the Python language. The documentation even says that it is useful for pretty-printing and colouring on-screen displays. Hopefully, using that will give you more of an idea how all this stuff works. Then you can either search for Python libraries for parsing other languages, or have a go at writing one yourself.

There's a Stack Overflow question here that suggests pyPEG for parsing other languages. Jimothy's suggests of Pygments is good too.

0
ekhumoro On

If you want to make your life easy, use QScintilla - it does everything you need and more straight out of the box.

QScintilla is included with the PyQt binary installers for Windows (which can be found here), and almost all Linux distros will have QScintilla packages in their repositories. Alternatively, the QScintilla source code can be found here.

And here's a minimal QScintilla example that shows how easy it is to get started:

import sys, os
from PyQt4 import QtGui, Qsci

class Window(Qsci.QsciScintilla):
    def __init__(self):
        Qsci.QsciScintilla.__init__(self)
        self.setLexer(Qsci.QsciLexerPython(self))
        self.setText(open(os.path.abspath(__file__)).read())

if __name__ == '__main__':

    app = QtGui.QApplication(sys.argv)
    window = Window()
    window.setGeometry(500, 300, 500, 500)
    window.show()
    sys.exit(app.exec_())
0
Abdul Rehman On

I know this question has been answered but as like me many other new users came here and will know that the options mentioned in above answers are little advance level. I am posting this answer for further reference for some other new comers

Creating a syntax Highlighter with python and Qt is a good choice. As python is very powerful language and QT is great framework for GUI application development.
Syntax highlighter is simplest Regex expression with QTextEdit Object. You just parse the Regex expressions and then select specific QTextFormat for that kind of regex and on apply that text format onto that block. Here is code example of simplest syntax highlighter implemented in Python using Qt4 the highlight function implemented in syntaxHighlighter class drived from QSyntaxHighlighter

    def highlightBlock(self, text):
    for pattern, format in self.highlightingRules:
        expression = QtCore.QRegExp(pattern)
        index = expression.indexIn(text)
        while index >= 0:
            length = expression.matchedLength()
            self.setFormat(index, length, format)
            index = expression.indexIn(text, index + length)

    self.setCurrentBlockState(0)

    startIndex = 0
    if self.previousBlockState() != 1:
        startIndex = self.commentStartExpression.indexIn(text)

    while startIndex >= 0:
        endIndex = self.commentEndExpression.indexIn(text, startIndex)

        if endIndex == -1:
            self.setCurrentBlockState(1)
            commentLength = len(text) - startIndex
        else:
            commentLength = endIndex - startIndex + self.commentEndExpression.matchedLength()

        self.setFormat(startIndex, commentLength,
                self.multiLineCommentFormat)
        startIndex = self.commentStartExpression.indexIn(text,
                startIndex + commentLength);

Using this example I have created an Assembly syntax highlighter in Python with Qt4 for 8051 microcontroller. For further reference and a good starting point you can refer to that code.