pandas.read_excel() cannot read an .xls file, what is going wrong?

99 views Asked by At

I am using pandas 2.2.0 and xlrd version is 2.0.1. The code snippet

import pandas as pd
filepath = './data/myfile.xls'
df = pd.read_excel(filepath)

generates the following log:

Traceback (most recent call last):
  File "/home/singhd/PycharmProjects/Debugging/main.py", line 30, in <module>
    df = pd.read_excel(file_path)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 495, in read_excel
    io = ExcelFile(
         ^^^^^^^^^^
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 1567, in __init__
    self._reader = self._engines[engine](
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_xlrd.py", line 46, in __init__
    super().__init__(
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 573, in __init__
    self.book = self.load_workbook(self.handles.handle, engine_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/pandas/io/excel/_xlrd.py", line 63, in load_workbook
    return open_workbook(file_contents=data, **engine_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/__init__.py", line 172, in open_workbook
    bk = open_workbook_xls(
         ^^^^^^^^^^^^^^^^^^
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/book.py", line 68, in open_workbook_xls
    bk.biff2_8_load(
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/book.py", line 637, in biff2_8_load
    cd = compdoc.CompDoc(self.filestr, logfile=self.logfile,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/compdoc.py", line 227, in __init__
    dbytes = self._get_stream(
             ^^^^^^^^^^^^^^^^^
  File "/home/singhd/PycharmProjects/Debugging/.venv/lib/python3.11/site-packages/xlrd/compdoc.py", line 293, in _get_stream
    if self.seen[s]:
       ~~~~~~~~~^^^
IndexError: array index out of range

It reads the file when I open it in Excel and save it as .xlsx, so the .xls file does not seem to be corrupt. What is going wrong here? What else can I try? Is this a well-known issue?

0

There are 0 answers