Strange mismatched tag error with pandas read xml

510 views Asked by At

I have been using the package pandas_read_xml to read XML files into pandas dataframe. However, I have started experiencing very strange behavior with this package lately. The xml parser occasionally crashes, but on repeated attempts, it works. I am really puzzled by this, so I was hoping if anyone here has the possibility to help me wrap my head around it. I will attempt to illustrate the problem I am facing below.

  import pandas as pd
  import pandas_read_xml as pdx

  data = pdx.read_xml('https://www.sec.gov/Archives/edgar/data/1000351/000114554921012283/primary_doc.xml', ['edgarSubmission'])

This occasionally returns an error “ExpatError: mismatched tag: line 50, column 124”. However, it works just fine upon repeated attempts. Similar behavior is observed for other paths. I have made sure that nothing is off about the xml file. I took a look at the Traceback and it contains the following:

 File "<ipython-input-118-c68fdb3a2633>", line 1, in <module>
 data = pdx.read_xml('https://www.sec.gov/Archives/edgar/data/1002537/000114554921006264/primary_doc.xml',['edgarSubmission'])

 File "C:\Users\A1610222\AppData\Local\Continuum\anaconda2\lib\site-packages\pandas_read_xml.py", 
 line 33, in read_xml return read_xml_as_dataframe(read_xml_from_url(path_or_xml), root_key_list, 
 root_is_rows=root_is_rows, transpose=transpose)

 File "C:\Users\A1610222\AppData\Local\Continuum\anaconda2\lib\site-packages\pandas_read_xml.py", 
 line 62, in read_xml_as_dataframe return pd.DataFrame([get_to_root_in_dict(xmltodict.parse(xml), 
 root_key_list)])

 File "C:\Users\A1610222\AppData\Local\Continuum\anaconda2\lib\site-packages\xmltodict.py", line 327, 
 in parse parser.Parse(xml_input, True)

 ExpatError: mismatched tag: line 50, column 124

It appears to be directing to line 33 and 62 in the package pandas_read_xml. I have uninstalled and reinstalled the package to make sure nothing is off, but the problem persists. Please excuse my ignorance if there is something completely elementary that I am missing. Please let me know in case anything is not clear. Looking forward to your kind help.

1

There are 1 answers

0
stump On

I discovered today that the problem was due to connectivity issue and had nothing to do with the package or the structure of xml files.