Processing XML file where tag data may contain processing instructions

57 views Asked by At

I am using python to extract the coordinate data from an non-standard xml file. The issue is that some of the coordinates tags contain xml processing instructions, for example

<?xml version="1.0" encoding="UTF-8"?>
<kml>
    <Document>
        <Folder>
            <Placemark id="CAR1">
            <coordinates>
                28.236931382585154893,36.380279408060658852,200<?AtTime 0.000000?>
                28.236862947306430982,36.380324400967296583,200<?AtTime 2.333333?>
                28.236862521767100986,36.380323843439811071,200<?AtTime 2.500000?>
            </coordinates>
                </Placemark>
        <Placemark id="CAR2">
            <coordinates>
                28.236931382585154893,36.380279408060658852,200
                28.236862947306430982,36.380324400967296583,200
                28.236862521767100986,36.380323843439811071,200
            </coordinates>
        </Placemark>
       </Folder>
    </Document>
</kml>

Below is my test code

from xml.dom import minidom
myFile = "test.xml"
mydoc = minidom.parse(myFile)
items = mydoc.getElementsByTagName('Placemark')
for elem in items:
    print("coordinates ",elem.attributes['id'].value)
    coorTag = elem.getElementsByTagName('coordinates')
    if len(coorTag) > 0:
        for elem in coorTag:
            theData = elem.firstChild.nodeValue
            print(theData)
            

This generates the following output.

coordinates  CAR1

                28.236931382585154893,36.380279408060658852,200
coordinates  CAR2

                28.236931382585154893,36.380279408060658852,200
                28.236862947306430982,36.380324400967296583,200
                28.236862521767100986,36.380323843439811071,200

How can I obtain all the coordinate data even when the internal data of a given tag may contain processing instructions?
Is this just a problem with minidom, maybe an alternative python library would be better?

The output I am looking for is.

coordinates  CAR1

                28.236931382585154893,36.380279408060658852,200<?AtTime 0.000000?>
                28.236862947306430982,36.380324400967296583,200<?AtTime 2.333333?>
                28.236862521767100986,36.380323843439811071,200<?AtTime 2.500000?>
coordinates  CAR2

                28.236931382585154893,36.380279408060658852,200
                28.236862947306430982,36.380324400967296583,200
                28.236862521767100986,36.380323843439811071,200
0

There are 0 answers