I am using the Google Drive API to get file revisions, then I am interested in reading (not downloading) the contents of each file revision to a pandas dataframe. I have been able to get the revision Ids, but reading the contents is what is problematic. I have tried the following code and getting a googleapiclient.errors.HttpError. If I change it from get_media to just get which returns alt=json, I do not get the file contents but the revision metadata. Will appreciate help here:
import io
import pandas as pd
from google_apis import create_service
from googleapiclient.http import MediaIoBaseDownload
import urllib
import csv
import requests
CLIENT_FILE = 'client_secret.json'
API_NAME = 'drive'
API_VERSION = 'v3'
SCOPES = ['https://www.googleapis.com/auth/drive']
# Retrieve file revision history
service = create_service(CLIENT_FILE, API_NAME, API_VERSION, SCOPES)
def get_file_revision_history(file_id):
response = service.revisions().list(
fileId=file_id,
fields='*',
pageSize=1000
).execute()
revisions = response.get('revisions')
nextPageToken = response.get('nextPageToken')
while nextPageToken:
response = service.revisions().list(
fileId=file_id,
fields='*',
pageSize=1000,
pageToken=nextPageToken
).execute()
revisions = response.get('revisions')
nextPageToken = response.get('nextPageToken')
return revisions
file_id = '1E8Wbd80CbFlFSHYZQkApXMM9EQOz1lQRl4m3rfq-vdY'
revision_history = get_file_revision_history(file_id)
print(revision_history)
df = pd.json_normalize(revision_history)
#df.to_csv('revision history ({0}).csv'.format(file_id), index=False)
#read the file contents
revision_history_id = '104'
res = service.revisions().get_media(
fileId = file_id,
revisionId = revision_history_id
).execute()
#uri = res.uri
print(res)
I thought that when I saw your file ID and your revision ID of
'104', the file might be Google Docs (Document, Spreadsheet, Slides, and so on). In this case, unfortunately,get_mediamethod cannot be used. Here,exportmethod is required to be used. But, unfortunately, it seems thatrevisions()has no method ofexport. So, in this case, the export link retrieved fromservice.revisions().get()` is used for downloading the data.And, your goal is to put the exported values on pandas. From this, I guessed that the file of your file ID might be Google Spreadsheet. If my understanding is correct, when this is reflected in a python script, how about the following sample script?
Sample script:
Here, the access token is retrieved from
credsofservice = build("drive", "v3", credentials=creds). Please be careful about this.In this case,
from io import BytesIO,import pandas as pd, andimport requestsare also used.When this script is run, the Google Spreadsheet is exported with the revision ID as XLSX format. And, the values from the expected sheet are put to pandas.
Reference: