Python XML Parsing can't find children of children

1.3k views Asked by At

I'm trying to parse XML returned as a string from a http get request. I need to get a specific link inside the XML structure but for some reason I can't get to the link I need. I tried **enumerating** the XML and printing child.attrib but the link I need is not displaying.

I need to find an element that is a child of a child and the element is called Vm, then I need to get the .attrib of that element.

Thus, I did some more research and tried finding the XML I needed by node name

The XML structure is:

<vapp>
   <link></link>
   <othertags></othertags>
   <Children>
      <Vm href='link I need'>
      <other tag options>
      </other tag options>
      </vm>
   </Children>
</vapp>

python code:

for i, child in enumerate(vappXML):
   if 'href' in child.attrib and 'name' in child.attrib:
      vapp_url =  child.attrib['href']

      r=requests.get(vapp_url, headers = new_headers)
      vmlinkXML = fromstring(r.content)

      for VM in vmlinkXML.findall('Children'):
         print VM

      for i, child in enumerate(vmlinkXML):
         if 'vm-' in child:
            print child.attrib

         if 'href' in child.attrib:
            vm_url =  child.attrib['href']
            if 'vm-' in vm_url:
               print vm_url

I can't get to the url no matter how I try. I only get the main child of vApp it never parses the tag, or rather my code never goes further than the first child of the vapp and I don't know why.

I guess I wasn't clear. I'm parsing vCloud Director Rest API XML that is returned as a string. The first level is the vApp link which is essentially a container for VMs. I need to get the VM link under each vApp. The first one will select vApp links and query them.

Once it does a get request on the vApp link it gets the next level of XML data which is the structure I put above. so it passes the initial XML statement and returns vApp information.

Even when I print out every child.attrib fom vmlinkXML the link with vm doesnt get printed. however, If I just print r.content the link is there. Its almost like the XML parser doesn't see the tag.

I'm using Pythons XML.etree

from lxml import etree
from xml.etree.ElementTree import XML, fromstring, tostring

So to be clear the structure is:

to get the vApp Links /api/admin/extension/vapps/query then the returned information will contain links to each vapp in vCloud. then I call the vApp link https://vcloud.test.co/api/vApp/vapp-3b4980e7-c5ab-4462-9cfe-abc6292c15748 and it will return a structure similar to this:

<vapp>
   <link></link>
   <othertags></othertags>
   <Children>
      <Vm href='link I need'>
      <other tag options>
      </other tag options>
      </vm>
   </Children>
</vapp>

Tag contains the next level of link I need to query. However the XML parser with child.attrib never outputs anything under the tag.

1

There are 1 answers

0
Grant Zukel On

Solved***

r=requests.get(url + '/api/admin/extension/vapps/query', headers = new_headers)
vappXML = fromstring(r.content)
for i, child in enumerate(vappXML):
   if 'href' in child.attrib and 'name' in child.attrib:
      vapp_url =  child.attrib['href']

      r=requests.get(vapp_url, headers = new_headers)
      DOMTree = parseString(r.content)
      vmElements = DOMTree.documentElement      
      VMS = vmElements.getElementsByTagName("Vm")

      for vm in VMS:
         if vm.hasAttribute("href"):
            vm_link = vm.getAttribute("href")
            print vm_link