How to print email body from outlook without signature - Python

1.1k views Asked by At

I'm trying to parse emails from Outlook. I would like the following printed:

  • subject
  • body (excluding sender's signature)
  • Ignore all previous emails from conversion (reply & forward)

Is there any way I can print out the body text before multi-space between lines (usually this is how signature being separated from the main text)?

Any help would be appreciated!

import win32com.client
#other libraries to be used in this script
import os
from datetime import datetime, timedelta


outlook = win32com.client.Dispatch('outlook.application')
mapi = outlook.GetNamespace("MAPI")

 
for account in mapi.Accounts:
    print(account.DeliveryStore.DisplayName) 
    
    
inbox = mapi.GetDefaultFolder(6)


messages = inbox.Items
messages.Sort('[ReceivedTime]', True)
received_dt = datetime.now() - timedelta(days=1)
received_dt = received_dt.strftime('%m/%d/%Y %H:%M %p')
messages = messages.Restrict("[ReceivedTime] >= '" + received_dt + "'")
messages = messages.Restrict("[SenderEmailAddress] = '[email protected]'")
message = messages.GetFirst()

print ("Current date/time: "+ received_dt)
while message:
    print(message.Subject)
    print(message.body)
    message = messages.GetNext ()
1

There are 1 answers

2
Selcuk On BEST ANSWER

You can use a regex to ignore everything after three newlines (there are normally one or two newlines between paragraphs):

import re

r = re.compile(r"(.*)\n\n\n", re.MULTILINE + re.DOTALL)

# ...

while message:
    # ...
    match = r.match(message.body)
    if match:
        body_without_signature = r.match(message.body).groups(0)
    else:
        # No signature found
        body_without_signature = message.body
    print(body_without_signature)