I'm try to analyse my 25k+ emails similar to the post here: http://beneathdata.com/how-to/email-behavior-analysis/
While the mentioned script used IMAP, I'm trying to implement this using the Gmail API for improved security. I'm using Python (and Pandas for data analysis) but the question applies more generally to use of the Gmail API.
From the docs, I'm able to read emails in using:
msgs = service.users().messages().list(userId='me', maxResults=500).execute()
and then access the data using a loop:
for msg in msgs['messages']:
m_id = msg['id'] # get id of individual message
message = service.users().messages().get(userId='me', id=m_id).execute()
payload = message['payload']
header = payload['headers']
for item in header:
if item['name'] == 'Date':
date = item['value']
** DATA STORAGE FUNCTIONS ETC **
but this is clearly very slow. In addition to looping over every message, I have to call the list() API call many times to cycle through all emails.
Is there a higher performance way to do this? e.g. to ask the API to only return the data rather than all unwanted message information.
Thanks.
You can batch your messages.get() operations into a batch, see: https://developers.google.com/gmail/api/guides/batch
You can put up to 100 requests into a batch.
Note that "a set of n requests batched together counts toward your usage limit as n requests, not as one request." So you may need to do some pacing to stay below request rate limits.
Here's a rough Python example that will fetch the messages given by a list of ids
id_list