xml.etree.ElementTree.Element.remove not removing all elements

Question

xml.etree.ElementTree.Element.remove not removing all elements

3.3k views Asked by jamadagni At 11 September 2024 at 08:18

Please see the following code:

import xml.etree.ElementTree as ET
for x in ("<a><b /><c><d /></c></a>", "<a><q /><b /><c><d /></c></a>", "<a><m /><q /><b /><c><d /></c></a>"):
    root = ET.fromstring(x)
    for e in root: root.remove(e)
    print(ET.tostring(root))

I expect it to output <a></a> in all instances but instead it gives:

b'<a><c><d /></c></a>'
b'<a><b /></a>'
b'<a><q /><c><d /></c></a>'

I totally don't grok this. I don't see any pattern to the specific elements that were removed either.

The documentation merely says:

Removes subelement from the element. Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents.

What am I doing/assuming wrong? I am getting basically the same output with both Python 2.7.5 and 3.4.0 on Kubuntu Trusty.

Thanks!

Original Q&A

There are 2 answers

Vivek Sable On 16 June 2015 at 06:25

Yes, get all children of root tag and remove one by one in reverse order

E.g.

In [1]: import xml.etree.ElementTree as ET 
In [2]: content = "<a><b /><c><d /></c></a>"
In [15]: root = ET.fromstring(content)
In [16]: for e in root.getchildren()[::-1]:
   ....:     print e
   ....:     root.remove(e)
   ....:     
<Element 'c' at 0xb60890ac>
<Element 'b' at 0xb608908c>

In [17]: ET.tostring(root)
Out[17]: '<a />'

With you code only one element is removed. E.g.

In [21]: root = ET.fromstring(content)
In [22]: for e in root:
   ....:     print "Element:", e
   ....:     root.remove(e)
   ....:     
Element: <Element 'b' at 0xb608936c>

In [23]: ET.tostring(root)
Out[23]: '<a><c><d /></c></a>'

without reverse

In [45]: root = ET.fromstring(content)

In [46]: for e in root.getchildren():
   ....:     print "Elenment:", e
   ....:     root.remove(e)
   ....:     
Elenment: <Element 'b' at 0xb6219dcc>

In [47]: ET.tostring(root)
Out[47]: '<a>asas<c><d /></c></a>'

**mhawke** · Accepted Answer · 2015-06-16 06:33:58

This demonstrates the problem:

>>> root = ET.fromstring("<a><b /><c><d /></c></a>")
>>> for e in root:
...     print(e)
... 
<Element 'b' at 0x7f76c6d6cd18>
<Element 'c' at 0x7f76c6d6cd68>
>>> for e in root:
...     print(e)
...     root.remove(e)
...
<Element 'b' at 0x7f76c6d6cd18>

So, modifying the object that you are iterating affects the iteration. This is not entirely unexpected, it is the same if you alter a list while iterating over it:

>>> l = [1, 2, 3, 4]
>>> for i in l:
...     l.remove(i)
>>> print l
[2, 4]

As a workaround you can repetitively remove the first subelement like this:

import xml.etree.ElementTree as ET
for x in ("<a><b /><c><d /></c></a>", "<a><q /><b /><c><d /></c></a>", "<a><m /><q /><b /><c><d /></c></a>"):
    root = ET.fromstring(x)
    for i in range(len(root)):
        root.remove(root[0])
    ET.tostring(root)

Output

b'<a />'
b'<a />'
b'<a />'

This works because the iterator is not varying while the loop is executed. Or, if you want to remove all subelements of the root element and its all of its attributes, you can use root.clear():

>>> root = ET.fromstring('<a href="blah"><b /><c><d /></c></a>')
>>> root.clear()
>>> ET.tostring(root)
b'<a />'

TechQA.

xml.etree.ElementTree.Element.remove not removing all elements

There are 2 answers

Related Questions in PYTHON

Related Questions in XML

Related Questions in ELEMENTTREE

Popular Questions

Popular Tags

Trending Questions