I'm currently facing a dilemma.
This code below doesn't print my XML properly :
import lxml.etree
xml_tree = lxml.etree.parse("myFile.xml")
root = xml_tree.getroot()
for fruit in root:
if fruit.tag == "apple":
for apple in fruit:
if apple.tag == "McIntosh":
fruit.remove(apple)
tree = lxml.etree.ElementTree(root)
tree.write("output.xml", pretty_print=True, xml_declaration=True, encoding="utf-8")
Here is my XML input file :
<fruits>
<apple>
<McIntosh/>
</apple>
</fruits>
And here is my (ugly) output XML file. the indentation is not correct :
<?xml version='1.0' encoding='UTF-8'?>
<fruits>
<apple>
</apple>
</fruits>
I read somewhere that to get the pretty printing actually work, I had to use a lxml.etree.XMLParser with the remove_blank_text=True option like this :
xml_parser = lxml.etree.XMLParser(remove_blank_text=True)
xml_tree = lxml.etree.parse("myFile.xml", xml_parser)
It works to actually activate the pretty printing but on the other hand my empty XML elements are now turned into self-closing elements :
<?xml version='1.0' encoding='UTF-8'?>
<fruits>
<apple/>
</fruits>
Does anyone know how to fix this side effect of lxml pretty printing ?
As mentionned in this question you just asked use the
indentfunction to fix your pretty print.Regarding creating non self-closing tags, this happens because the text property is set to None, just set it to an empty string as answered there.