I want to wrap tag.text into CDATA:
<?xml version="1.0" encoding="utf-8" ?>
<root>
<tag>
some data
<!-- some data2 -->
<!-- some data2 -->
some data
</tag>
</root>
But when i parse tag.text with comments inside it parse only text before comments:
from lxml import etree
parser = etree.XMLParser()
#parser = etree.XMLParser(remove_comments=True)
tree = etree.parse("./data.xml", parser)
root = tree.getroot()
for tag in root.findall("tag"):
tag.text = etree.CDATA(tag.text)
tree.write("./result.xml",
encoding = "utf-8",
xml_declaration = True)
And i get this (tag.text = some data):
<?xml version='1.0' encoding='UTF-8'?>
<root>
<tag><![CDATA[
some data
]]><!-- some data2 -->
<!-- some data2 -->
some data
</tag>
</root>
How to fix it?
If you want to concatenate all of the text within the
<tag>elements, you can use thestr.joinmethod on the elementsitertextmethod. This will join all of the text including whitespaces before passing to theCDATAmethod.The comments are considered child elements of the
<tag>element in your example. The tail text is iterated over when using theitertextmethod.