I want to read .xlsx file as pandas dataframe from an FTP connection, However I want to do this on memory without writing the .xlsx to my local disk.
Here is my current code:
import ftplib
import pandas as pd
from io import BytesIO
ftp = ftplib.FTP("host")
ftp.login("ftp_111", "hs12121")
ftp.dir()
listff = ftp.nlst()
flo = BytesIO()
for filename in listff:
try:
ftp.retrbinary('RETR ' + filename, flo.write, 1024)
flo.seek(0)
df = pd.read_excel(flo)
except Exception as e:
print("An exception occurred: ", e)
KeyError: "There is no item named 'xl/sharedStrings.xml' in the archive"
how can i solve it?
I guess you get the error on the second iteration, because you do not reset the
flobeforeftp.retrbinary. I suggest you move theflo = BytesIO()right before theftp.retrbinarycall:(I've also removed the
[blocksize=]1024– what is its point?)