How to create a syntax highlighted .docx with python?

44 views Asked by At

iam new to python and currently trying to write a program that reads multiple .xml files from the file explorer and displays them in .docx file. (Formatted and colour highlighted)

I tried using python-docx and lxml to first read the xml as a formatted string and afterwards write it into the .docx file.

Unfortunately i cant figure out a way to colour highlight the text in the .docx file like for example when using notepad++.

Maybe you guys have some ideas?

My approach:

from docx import Document
from lxml import etree

store_location = "xyz.docx"
template_path = "abc.docx"

doc = Document(template_path)

xml_et = etree.parse("example.xml")
xml_string = etree.tostring(xml_et,pretty_print=True,encoding=str)
doc.add_paragraph(xml_string)
        
doc.save(store_location)
1

There are 1 answers

0
yoann guillard On

Basically, you have to put colour on each word or character which have a special meaning in XML. That's huge work. So firstly you'll need a syntax highlighter. In python, the game breaker one is Pygments. You'll use the XML Lexer.

Once you have your code highlighted, you have to insert it in your document. You have then 2 solutions :

  • Select the Formatter you prefer and iterate over token then add it to your document through python-docx interface.
  • Directly add xml representation supported by OpenDocument. But it seems there's no Formatter for OpenDocument, so you'll have to create a bridge with one existing Formatter or even better develop the new output for Pygments.

I think the best solution is the second one. Firstly, you should take a quick look at ISO/IEC 26300 OpenDocument, then open a question or an issue on the Pygments' git.