I have an XML which is not well formed as I am getting this error when I am trying to read XML:
import xml.etree.ElementTree as ET ET.parse(r'my.xml')
I get the below error
ParseError: not well-formed (invalid token): line 2034, column 317
So, I used
BeautifulSoup to read the xml by below code:
from bs4 import BeautifulSoup with open(r'my.xml') as fp: soup = BeautifulSoup(fp, 'xml')
If I print
soup it looks like this:
<Placemark> <name>India </name> <description>Country</description> <styleUrl>#icon-962-B29189</styleUrl> </Placemark> <Placemark> <name>USA</name> <styleUrl>#icon-962-B29189</styleUrl> </Placemark> <Placemark> <description>City</description> <styleUrl>#icon-962-B29189</styleUrl> </Placemark>
I have a total of more than 100
Placemark tags and the information in them. I want to capture
description of each tag and make a
df with respective columns.
My code for same is:
name_tag=[x.text.strip() for x in soup.findAll('name')] description_tag =[x.text.strip() for x in soup.findAll('description')]
The problem is for some of the
Placemark tags I don't have
description tag at all. And hence I am not able to know which name has what description. So, there is a mismatch between name and description because of absence of tags.
Expected Output Dataframe:
Name Description India Country USA City
Is their any way I can achieve the same?