How to get the text of the email body?

77 views Asked by At

I have this code but I don't actually get the email text.

Have I got to decode the email text?

import sys
import imaplib
import getpass
import email
import email.header
from email.header import decode_header
import base64

def read(username, password, sender_of_interest):
    # Login to INBOX
    imap = imaplib.IMAP4_SSL("imap.mail.com", 993)
    imap.login(username, password)
    imap.select('INBOX')
    # Use search(), not status()
    # Print all unread messages from a certain sender of interest
    if sender_of_interest:
        status, response = imap.uid('search', None, 'UNSEEN', 'FROM {0}'.format(sender_of_interest))
    else:
        status, response = imap.uid('search', None, 'UNSEEN')
    if status == 'OK':
        unread_msg_nums = response[0].split()
    else:
        unread_msg_nums = []
    data_list = []
    for e_id in unread_msg_nums:
        data_dict = {}
        e_id = e_id.decode('utf-8')
        _, response = imap.uid('fetch', e_id, '(RFC822)')
        html = response[0][1].decode('utf-8')
        email_message = email.message_from_string(html)
        data_dict['mail_to'] = email_message['To']
        data_dict['mail_subject'] = email_message['Subject']
        data_dict['mail_from'] = email.utils.parseaddr(email_message['From'])
        #data_dict['body'] = email_message.get_payload()[0].get_payload()
        data_dict['body'] = email_message.get_payload()
        data_list.append(data_dict)
    print(data_list)
    # Mark them as seen
    #for e_id in unread_msg_nums:
        #imap.store(e_id, '+FLAGS', '\Seen')
    imap.logout()    
    return data_dict

So I do this:

print('Getting the email text bodiies ... ')
emailData = read(usermail, pw, sender_of_interest)
print('Got the data!')
for key in emailData.keys():
    print(key, emailData[key])

The output is:

mail_to [email protected]
mail_subject Get json file
mail_from ('Pedro Rodriguez', '[email protected]')
body [<email.message.Message object at 0x7f7d9f928df0>, <email.message.Message object at 0x7f7d9f928f70>]

How to actually get the email text?

2

There are 2 answers

2
tripleee On BEST ANSWER

Depending on what exactly you mean by "the text", you probably want the get_body method. But you are thoroughly mangling the email before you get to that point. What you receive from the server isn't "HTML" and converting it to a string to then call message_from_string on it is roundabout and error-prone. What you get are bytes; use the message_from_bytes method directly. (This avoids all kinds of problems when the bytes are not UTF-8; the message_from_string method only really made sense back in Python 2, which didn't have explicit bytes.)

from email.policy import default
...

        _, response = imap.uid(
            'fetch', e_id, '(RFC822)')
        email_message = email.message_from_bytes(
            response[0][1], policy=default)
        body = email_message.get_body(
            ('html', 'text')).get_content()

The use of a policy selects the (no longer very) new EmailMessage; you need Python 3.3+ for this to be available. The older legacy email.Message class did not have this method, but should be avoided in new code for many other reasons as well.

This could fail for multipart messages with nontrivial nested structures; the get_body method without arguments can return a multipart/alternative message part and then you have to take it from there. You haven't specified what your messages are expected to look like so I won't delve further into that.

More fundamentally, you probably need a more nuanced picture of how modern email messages are structured. See What are the "parts" in a multipart email?

0
Pedroski On

My problem was, I got this for the email body.

<email.message.Message object at 0x7f7d9f928df0>

I didn't know it, but that is all you need! I didn't know what to do with it! I suppose, when you have done this a few times, you know this and it is easy!

You can get the above in 2 ways, whereby 1 seems directer:

1.

status, messageParts = M.fetch(num, '(RFC822)')
email_message = email.message_from_bytes(messageParts[0][1], policy=default)

gives: email_message

<email.message.EmailMessage object at 0x7fae30730ca0>

  1. status, messageParts = M.fetch(num, '(RFC822)') emailBody = messageParts[0][1] raw_email_string = emailBody.decode('utf-8')

    this below makes msg = <email.message.Message object at 0x7fae31758af0>

    you can get this from directly email_message = email.message_from_bytes(messageParts[0][1], policy=default)

     email_message = email.message_from_string(raw_email_string)
    

    gives: email_message

<email.message.EmailMessage object at 0x7fae30730ca0>

Either way, you want email_message as an email module email.message.EmailMessage object which is walkable. You can get all the info you want!

email_message['To']
'[email protected]'
email_message['From']
'Pedro Rodriguez <[email protected]>'
email_message['Date']
'Sun, 25 Feb 2024 16:55:55 +0100'

To get the plain body text:

for part in email_message.walk():
    if part.get_content_type() == "text/plain":
        print(part.get_payload())

Here is the report you wanted.

To get html text:

for part in email_message.walk():
    if part.get_content_type() == "text/html":
        print(part.get_payload())

    
Here is the report you wanted.

And if you want to get an attached json file:

# get attached json file
            if part.get_content_type() == "application/json":
                filename = part.get_filename()
                if filename:
                    filePath = os.path.join(savepath, 'email_attachments', filename)
                    with open(filePath, 'wb') as f:
                        f.write(part.get_payload(decode=True))
                        print(f"Saved TLS-RPT report: {filePath}")