Read .xlsx as pandas dataframe from FTP without writting to disk

77 views Asked by At

I want to read .xlsx file as pandas dataframe from an FTP connection, However I want to do this on memory without writing the .xlsx to my local disk.

Here is my current code:

import ftplib
import pandas as pd
from io import BytesIO

ftp = ftplib.FTP("host") 
ftp.login("ftp_111", "hs12121") 
ftp.dir()

listff = ftp.nlst()
flo = BytesIO()

for filename in listff:
    try:
        ftp.retrbinary('RETR ' + filename, flo.write, 1024)
        flo.seek(0)
        df = pd.read_excel(flo)

    except Exception as e:
        print("An exception occurred: ", e)

KeyError: "There is no item named 'xl/sharedStrings.xml' in the archive"

how can i solve it?

1

There are 1 answers

0
Martin Prikryl On

I guess you get the error on the second iteration, because you do not reset the flo before ftp.retrbinary. I suggest you move the flo = BytesIO() right before the ftp.retrbinary call:

flo = BytesIO()
ftp.retrbinary('RETR ' + filename, flo.write)

(I've also removed the [blocksize=]1024 – what is its point?)