lxml tag removing - unexpected result using lxml remove

63 views Asked by At

I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks.

# %%
from lxml import etree

# unexpected output when nested elements are deleted
xml_str = """
<spdoc>
  <commentary>
    <body>
      <delete1>
        <delete2>
        </delete2>
      </delete1>
      <section name="delete">
      </section>
    </body>
  </commentary>
</spdoc>
"""

root = etree.fromstring(xml_str)
for element in root.iter():
    is_remove = False
    if element.tag == "delete1":
        is_remove = True
    if element.tag == "delete2":
        is_remove = True
    if element.tag == "section" and element.attrib.get("name") == "delete":
        is_remove = True
    print(f"{element} {is_remove}")
    if is_remove:
        element.getparent().remove(element)

print(etree.tostring(root, encoding="utf-8").decode("utf-8"))

the unexpected output is :

<Element spdoc at 0x103489680> False
<Element commentary at 0x103489640> False
<Element body at 0x1034b5540> False
<Element delete1 at 0x1037eb700> True
<Element delete2 at 0x1037eba40> True
<spdoc>
    <commentary>
        <body>
            <section name="delete">
            </section>
        </body>
    </commentary>
</spdoc>
0

There are 0 answers