Can't manage with lxml to have both pretty printing and not turning xml elements to self-closing elements

92 views Asked by At

I'm currently facing a dilemma.

This code below doesn't print my XML properly :

import lxml.etree

xml_tree = lxml.etree.parse("myFile.xml")
root = xml_tree.getroot()

for fruit in root:
    if fruit.tag == "apple":
        for apple in fruit:
            if apple.tag == "McIntosh":
                fruit.remove(apple)

tree = lxml.etree.ElementTree(root)
tree.write("output.xml", pretty_print=True, xml_declaration=True, encoding="utf-8")

Here is my XML input file :

<fruits>
  <apple>
    <McIntosh/>
  </apple>
</fruits>

And here is my (ugly) output XML file. the indentation is not correct :

<?xml version='1.0' encoding='UTF-8'?>
<fruits>
  <apple>
    </apple>
</fruits>

I read somewhere that to get the pretty printing actually work, I had to use a lxml.etree.XMLParser with the remove_blank_text=True option like this :

xml_parser = lxml.etree.XMLParser(remove_blank_text=True)
xml_tree = lxml.etree.parse("myFile.xml", xml_parser)

It works to actually activate the pretty printing but on the other hand my empty XML elements are now turned into self-closing elements :

<?xml version='1.0' encoding='UTF-8'?>
<fruits>
  <apple/>
</fruits>

Does anyone know how to fix this side effect of lxml pretty printing ?

1

There are 1 answers

1
I like Bananas On

As mentionned in this question you just asked use the indent function to fix your pretty print.

Regarding creating non self-closing tags, this happens because the text property is set to None, just set it to an empty string as answered there.