How I can take XML data using lxml.objectify into the DataFrame pandas and show it in table format?

985 views Asked by At

This is an XML file that has data in which I want to perform the task using lxml.objectify & pandas.DataFrame
File: students.xml

<?xml version="1.0" encoding="UTF-8"?>

<college>
    <department>
    <name>Information Technology</name>
        <semester>
            <sem_3>
                <student_no>1</student_no>
                <student_name>Ravindra</student_name>
                <student_city>Ahmedabad</student_city>
            </sem_3>
        </semester>
    </department>
    <department>
    <name>Computer Engineering</name>
        <semester>
            <sem_3>
                <student_no>2</student_no>
                <student_name>Surya</student_name>
                <student_city>Gandhinagar</student_city>
            </sem_3>
        </semester>
    </department>
</college>

I tried this and could only get this output.

import pandas as pd
from lxml import objectify
from pandas import DataFrame
xml = objectify.parse(open('students.xml'))
root = xml.getroot()
number = []
name = []
city = []
for i in range(0, 2):
  obj = root.getchildren()[i].getchildren()
  for j in range(0, 1):
    child_obj = obj[1].getchildren()[j].getchildren()
    number.append(child_obj[0])
    name.append(child_obj[1])
    city.append(child_obj[2])
df = pd.DataFrame(list(zip(number, name, city)), columns =['student_no', 'student_name', 'student_city'])
print(df)
-----------------------------------------------
  student_no    student_name       student_city
0    [[[1]]]  [[[Ravindra]]]    [[[Ahmedabad]]]
1    [[[2]]]     [[[Surya]]]  [[[Gandhinagar]]]
-----------------------------------------------

I'm not able to get output like this...

-----------------------------------------------
  student_no    student_name       student_city
0          1        Ravindra          Ahmedabad
1          2           Surya        Gandhinagar
-----------------------------------------------

Can you help me with this?

1

There are 1 answers

1
woblob On BEST ANSWER

You were appending lxml objects to your list

import pandas as pd
from lxml import objectify
from pandas import DataFrame
with open('students.xml') as f:
    xml = objectify.parse(f)
root = xml.getroot()
number = []
name = []
city = []
for i in range(0, 2):
    obj = root.getchildren()[i].getchildren()
    for j in range(0, 1):
        child_obj = obj[1].getchildren()[j].getchildren()
        number.append(int(child_obj[0].text))
        name.append(child_obj[1].text)
        city.append(child_obj[2].text)
data = {"student_no": number, 'student_name': name, 'student_city': city}         
df = pd.DataFrame(data)
print(df)

outputs:

  student_no student_name student_city
0          1     Ravindra    Ahmedabad
1          2        Surya  Gandhinagar