I have been using the package pandas_read_xml to read XML files into pandas dataframe. However, I have started experiencing very strange behavior with this package lately. The xml parser occasionally crashes, but on repeated attempts, it works. I am really puzzled by this, so I was hoping if anyone here has the possibility to help me wrap my head around it. I will attempt to illustrate the problem I am facing below.
import pandas as pd
import pandas_read_xml as pdx
data = pdx.read_xml('https://www.sec.gov/Archives/edgar/data/1000351/000114554921012283/primary_doc.xml', ['edgarSubmission'])
This occasionally returns an error “ExpatError: mismatched tag: line 50, column 124”. However, it works just fine upon repeated attempts. Similar behavior is observed for other paths. I have made sure that nothing is off about the xml file. I took a look at the Traceback and it contains the following:
File "<ipython-input-118-c68fdb3a2633>", line 1, in <module>
data = pdx.read_xml('https://www.sec.gov/Archives/edgar/data/1002537/000114554921006264/primary_doc.xml',['edgarSubmission'])
File "C:\Users\A1610222\AppData\Local\Continuum\anaconda2\lib\site-packages\pandas_read_xml.py",
line 33, in read_xml return read_xml_as_dataframe(read_xml_from_url(path_or_xml), root_key_list,
root_is_rows=root_is_rows, transpose=transpose)
File "C:\Users\A1610222\AppData\Local\Continuum\anaconda2\lib\site-packages\pandas_read_xml.py",
line 62, in read_xml_as_dataframe return pd.DataFrame([get_to_root_in_dict(xmltodict.parse(xml),
root_key_list)])
File "C:\Users\A1610222\AppData\Local\Continuum\anaconda2\lib\site-packages\xmltodict.py", line 327,
in parse parser.Parse(xml_input, True)
ExpatError: mismatched tag: line 50, column 124
It appears to be directing to line 33 and 62 in the package pandas_read_xml. I have uninstalled and reinstalled the package to make sure nothing is off, but the problem persists. Please excuse my ignorance if there is something completely elementary that I am missing. Please let me know in case anything is not clear. Looking forward to your kind help.
I discovered today that the problem was due to connectivity issue and had nothing to do with the package or the structure of xml files.