Gmail API - Quickly access the dates of every email ever sent / received

4.6k views Asked by At

I'm try to analyse my 25k+ emails similar to the post here: http://beneathdata.com/how-to/email-behavior-analysis/

While the mentioned script used IMAP, I'm trying to implement this using the Gmail API for improved security. I'm using Python (and Pandas for data analysis) but the question applies more generally to use of the Gmail API.

From the docs, I'm able to read emails in using:

msgs = service.users().messages().list(userId='me', maxResults=500).execute()

and then access the data using a loop:

for msg in msgs['messages']:
    m_id = msg['id'] # get id of individual message
    message = service.users().messages().get(userId='me', id=m_id).execute()
    payload = message['payload'] 
    header = payload['headers']

    for item in header:
        if item['name'] == 'Date':
           date = item['value']
           ** DATA STORAGE FUNCTIONS ETC **

but this is clearly very slow. In addition to looping over every message, I have to call the list() API call many times to cycle through all emails.

Is there a higher performance way to do this? e.g. to ask the API to only return the data rather than all unwanted message information.

Thanks.

Reference: https://developers.google.com/resources/api-libraries/documentation/gmail/v1/python/latest/gmail_v1.users.messages.html

1

There are 1 answers

4
payne On BEST ANSWER

You can batch your messages.get() operations into a batch, see: https://developers.google.com/gmail/api/guides/batch

You can put up to 100 requests into a batch.

Note that "a set of n requests batched together counts toward your usage limit as n requests, not as one request." So you may need to do some pacing to stay below request rate limits.

Here's a rough Python example that will fetch the messages given by a list of ids id_list

msgs = []
def fetch(rid, response, exception):
    if exception is not None:
        print exception
    else:
        msgs.append(response)

# Make a batch request
batch = gmail.new_batch_http_request()
for message_id in id_list:
    t = gmail.users().messages().get(userId='me', id=message_id, format=fmt)
    batch.add(t, callback=fetch)

batch.execute(http=http)