I need to read the contents of an excel file stored in sharepoint and convert it into a dataframe without saving it on my pc. But at this point I need assistance to point me in the right direction
The code goes like this:
- login on sharepoint
- get the content of an .xlsx file
- load that content into a pandas dataframe
- return the dataframe object
I'm using Office365-REST-Python-Client library I do not know where is my mistake but at the end I got the following error:
zipfile.BadZipFile: File is not a zip file
here is the code:
import asyncio
import io
import json
from office365.runtime.auth.user_credential import UserCredential
from office365.runtime.http.request_options import RequestOptions
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import pandas as pd
SITEURL="https://{my company name}.sharepoint.com/sites/{my site}"
FILE_URL="Documentos%20compartidos/Test/Test2.xlsx"
PASSWORD="{my password}"
USERNAME="{my user name/ mail}"
# sharepoint core login
async def loging():
username=USERNAME
password=PASSWORD
site_url=SITEURL
file_url=FILE_URL
ctx = ClientContext(site_url).with_credentials(UserCredential(f"{username}", f"{password}"))
request = RequestOptions("{0}/_api/web/".format(site_url))
response = ctx.pending_request().execute_request_direct(request)
ajson = json.loads(response.content)
web_title = ajson['d']['Title']
print(web_title) # this print help me to confirm that I logged in
ctx.load(ctx.web)
ctx.execute_query()
response = File.open_binary(ctx, file_url)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, engine='openpyxl') #<-- here I got the error.
# without the "engine='openpyxl'", I've got another error asking for an explicit engine
return df
Thanks in advance.