xml.etree.ElementTree.Element.remove not removing all elements

3.4k views Asked by At

Please see the following code:

import xml.etree.ElementTree as ET
for x in ("<a><b /><c><d /></c></a>", "<a><q /><b /><c><d /></c></a>", "<a><m /><q /><b /><c><d /></c></a>"):
    root = ET.fromstring(x)
    for e in root: root.remove(e)
    print(ET.tostring(root))

I expect it to output <a></a> in all instances but instead it gives:

b'<a><c><d /></c></a>'
b'<a><b /></a>'
b'<a><q /><c><d /></c></a>'

I totally don't grok this. I don't see any pattern to the specific elements that were removed either.

The documentation merely says:

Removes subelement from the element. Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents.

What am I doing/assuming wrong? I am getting basically the same output with both Python 2.7.5 and 3.4.0 on Kubuntu Trusty.

Thanks!

2

There are 2 answers

5
mhawke On BEST ANSWER

This demonstrates the problem:

>>> root = ET.fromstring("<a><b /><c><d /></c></a>")
>>> for e in root:
...     print(e)
... 
<Element 'b' at 0x7f76c6d6cd18>
<Element 'c' at 0x7f76c6d6cd68>
>>> for e in root:
...     print(e)
...     root.remove(e)
...
<Element 'b' at 0x7f76c6d6cd18>

So, modifying the object that you are iterating affects the iteration. This is not entirely unexpected, it is the same if you alter a list while iterating over it:

>>> l = [1, 2, 3, 4]
>>> for i in l:
...     l.remove(i)
>>> print l
[2, 4]

As a workaround you can repetitively remove the first subelement like this:

import xml.etree.ElementTree as ET
for x in ("<a><b /><c><d /></c></a>", "<a><q /><b /><c><d /></c></a>", "<a><m /><q /><b /><c><d /></c></a>"):
    root = ET.fromstring(x)
    for i in range(len(root)):
        root.remove(root[0])
    ET.tostring(root)

Output

b'<a />'
b'<a />'
b'<a />'

This works because the iterator is not varying while the loop is executed. Or, if you want to remove all subelements of the root element and its all of its attributes, you can use root.clear():

>>> root = ET.fromstring('<a href="blah"><b /><c><d /></c></a>')
>>> root.clear()
>>> ET.tostring(root)
b'<a />'
3
Vivek Sable On

Yes, get all children of root tag and remove one by one in reverse order

E.g.

In [1]: import xml.etree.ElementTree as ET 
In [2]: content = "<a><b /><c><d /></c></a>"
In [15]: root = ET.fromstring(content)
In [16]: for e in root.getchildren()[::-1]:
   ....:     print e
   ....:     root.remove(e)
   ....:     
<Element 'c' at 0xb60890ac>
<Element 'b' at 0xb608908c>

In [17]: ET.tostring(root)
Out[17]: '<a />'

With you code only one element is removed. E.g.

In [21]: root = ET.fromstring(content)
In [22]: for e in root:
   ....:     print "Element:", e
   ....:     root.remove(e)
   ....:     
Element: <Element 'b' at 0xb608936c>

In [23]: ET.tostring(root)
Out[23]: '<a><c><d /></c></a>'

without reverse

In [45]: root = ET.fromstring(content)

In [46]: for e in root.getchildren():
   ....:     print "Elenment:", e
   ....:     root.remove(e)
   ....:     
Elenment: <Element 'b' at 0xb6219dcc>

In [47]: ET.tostring(root)
Out[47]: '<a>asas<c><d /></c></a>'