XML file parsing - Get data from each parent and their own children

71 views Asked by At

I would like to get data from each parent and their own children fro an XML file.

I'm trying to parse this XML file

<DB>
    <Entry>
        <Name>Assembly.iam</Name>
        <DisplayName>Assembly.iam</DisplayName>
        <Scalar>
            <Name>d0</Name>
            <DisplayName>d0 (value = 0 mm)</DisplayName>
            <Value>0</Value>
        </Scalar>
        <Scalar>
             <Name>d1</Name>
             <DisplayName>d1 (value = 0 mm)</DisplayName>
        <Value>0</Value>
        </Scalar>
    </Entry>
    <Entry>
        <Name>Ground.ipt</Name>
        <DisplayName>Ground.ipt</DisplayName>
        <Scalar>
            <Name>Ground_length</Name>
            <DisplayName>Ground_length (value = 160 mm)</DisplayName>
            <Value>160</Value>
        </Scalar>
        <Scalar>
            <Name>d2</Name>
            <DisplayName>d2 (value = 80 mm)</DisplayName>
            <Value>80</Value>
        </Scalar>
    </Entry>
</DB>

In fact, I would like to get the data which are into <DisplayName></DisplayName>. Then, I would like to put that data into an array of tuples like this

[(Assembly.iam,[d0 (value = 0 mm), d1 (value = 0 mm)]),
(Ground.ipt,[Ground_length (value = 160 mm), d2 (value = 80 mm)])

I have tried to use the xml.etree.cElementTree library with this code

from xml.etree import cElementTree
import numpy as np

workingDir = "C:/Users/Vince/Test"
newStrWorkingDir = str.replace(workingDir, '/', '\\')
tree = cElementTree.parse(newStrWorkingDir + "\\test.xml")
root = tree.getroot()
tab = np.empty(shape=(0, 0))
tabEntry = np.empty(shape=(0, 0))
tabScalar = np.empty(shape=(0, 0))

for entry in root.findall('Entry'):
    entryNames = entry.findall("./DisplayName")
    entryNamesText = entry.find("./DisplayName").text
    tabEntry = np.append(tabEntry,entryNamesText)
    for scalar in entry.findall('Scalar'):
        scalarNames = scalar.findall("./DisplayName")
        scalarNamesText = scalar.find("./DisplayName").text
        tabScalar = np.append(tabScalar,scalarNamesText)
        tab = np.append(tab,(entryNamesText,scalarNamesText))

print(tab)

But it outputs me this

['Assembly.iam' 'd0 (value = 0 mm)'
'Assembly.iam' 'd1 (value = 0 mm)'
'Ground.ipt' 'Ground_length (value = 160 mm)' 
'Ground.ipt' 'd2 (value = 80 mm)']
1

There are 1 answers

0
Daniel On BEST ANSWER

To get your wanted structure, you have to build lists of lists:

import os
from xml.etree import cElementTree

workingDir = "C:\\Users\\Vince\\Test"
tree = cElementTree.parse(os.path.join(newStrWorkingDir, "test.xml"))
root = tree.getroot()
tab = []

for entry in root.findall('Entry'):
    entry_name = entry.findtext("./DisplayName")
    scalar_names = [e.text for e in entry.findall('Scalar/DisplayName')]
    tab.append((entry_name, scalar_names))
print(tab)