How to remove items from lxml tree if I have a given set of elements I want to keep?

104 views Asked by At

I am writing a Python xml (netconf) parser and the target goal is to get a rpc-reply xml from server, modify some items, and produce a minimal config .xml that can be then sent to the server.

When modifying the values in GUI i add the modified elements to a set, along with their ancestor elements and sibling elements which don't contain children, since that would be the content of "minimum viable" result file

example (shortened) xml that I'm processing:

<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="urn:uuid:a1cfef75-dba4-4fdf-81eb-8d5f65d35511">
  <data>
    <bridges xmlns="urn:ieee:std:802.1Q:yang:ieee802-dot1q-bridge">
      <bridge>
        (...)
      </bridge>
    </bridges>
    <interfaces xmlns="urn:ietf:params:xml:ns:yang:ietf-interfaces">
      <interface>
        <name>PORT_0</name>
        <description>random</description>
        <type xmlns:ianaift="urn:ietf:params:xml:ns:yang:iana-if-type">ianaift:ethernetCsmacd</type>
        <bridge-port xmlns="urn:ieee:std:802.1Q:yang:ieee802-dot1q-bridge">
            (...)
        </bridge-port>
      </interface>
      <interface>
        (...)
      </interface>
    </interfaces>
    <keystore xmlns="urn:ietf:params:xml:ns:yang:ietf-keystore">
        (...)
    </keystore>
  </data>
</rpc-reply>

I have found a problem while using .iter() and .remove() together, i.e. when I want to modify e.g. <description>, it removes only the <bridge> branch, and then the .iter() loop does not go back to <interfaces>, or back to it's immediate ancestor, most likely due to the fact that I've already removed every information about ancestors. In other words, the .iter(). loop stops at first encountered "last leaf" element.

I'm using the following code to remove items, self.itemstokeep is a set of etree.Element to keep

for item in treecopy.iter():
    if not item in self.itemstokeep:
        if not item.getparent() == None:
            item.getparent().remove(item)
    else:
        continue

Can you recommend any nice way to solve this or completely work around the problem?
The biggest difference from the answers I've found here so far is that I won't know which items to remove, only which ones to keep, and I won't always have the same input structure with exception of 2 top-level elements, which makes the usual 'xpath' approach complicated...

I've also thought about moving away from creating an itemstokeep set and basically rebuilding a tree while the elements are modified, but if it seemed like a non-optimized solution since I would need to always check for duplicates among ancestors and iterate through tree a lot - but maybe I'm missing something there as well.

3

There are 3 answers

0
Maciej On BEST ANSWER

The posted answers did not work. In case anyone has a similar problem in the future, I've solved this by doing a workaround with 2 loops:

  1. 1st loop creates a "negative" of item set, i.e. set of items I want to remove, which is first defined as empty deleteset = set()

  2. 2nd loop iterates through created set and removes previously defined elements

        for item in treecopy.iter():
           if not item in self.copyitems_to_keep:                 
              if not item.getparent() == None:                     
                 deleteset.add(item)
              else:                     
                 continue
    
        for item in deleteset:
           item.getparent().remove(item)
    

Thanks to Hermann12 comments to original question, I've also realized my mistake in another part of code - originally I did not use deepcopy() to create treecopy root element, which was causing another class of problems in the application.

In case anyone stumbles on this thread in future I would still love to know if there's a way to force .iter() not to go into the no-longer existing tree branch once the element containing children is removed.

0
Adrián Prestamo On

Here's a solution. You need to basically create a new empty tree were you will be adding all the items you want to keep instead of the ones that don't match the condition

# Parse the XML
root = ET.fromstring(xml_data)

# Set of elements to keep (you have this already)
items_to_keep = set()

# A function to recursively copy elements you want to keep
def copy_elements(element):
    if element in items_to_keep:
        # Clone the element and its attributes
        new_element = ET.Element(element.tag, element.attrib)
        # Copy the text (if any)
        new_element.text = element.text
        new_element.tail = element.tail
        # Recursively copy child elements
        for child in element:
            new_child = copy_elements(child)
            new_element.append(new_child)
        return new_element
    else:
        # If not in the items to keep, return None
        return None

# Create a new XML tree, starting from the root
new_root = copy_elements(root)

# Create a new XML tree and add the new root
new_tree = ET.ElementTree(new_root)

# Serialize the new tree to XML
new_xml = ET.tostring(new_root, encoding='unicode')

print(new_xml)
0
Momin Ali On

One way to solve this problem is by creating a new XML tree, starting with the root element, and adding only the elements you want to keep. This way, you avoid the issue of removing elements from an existing tree and ensure that the resulting tree contains only the desired elements. Here's how you can do it:

from xml.etree import ElementTree as ET

# Assuming your original XML is stored in the 'xml_string' variable
root = ET.fromstring(xml_string)

# Create a new XML tree with the root element
new_tree = ET.Element(root.tag, nsmap=root.nsmap)

# A set of elements to keep
elements_to_keep = {"interfaces", "interface", "name", "description"}

# Initialize a stack for ancestors
ancestors = [new_tree]

# Iterate through the original tree
for elem in root.iter():
    if elem.tag in elements_to_keep:
        # Add the element to the current ancestor
        current_ancestor = ancestors[-1]
        current_ancestor.append(elem)

        # If the element has children, push it onto the stack of ancestors
        if list(elem):
            ancestors.append(elem)
    elif elem.tag == ancestors[-1].tag:
        # If we encounter an element with the same tag as the current ancestor,
        # pop it from the stack to move back up the tree
        ancestors.pop()

# Convert the new tree to a string
new_xml_string = ET.tostring(new_tree).decode()

# Print the resulting XML
print(new_xml_string)