When a message which attachment gets saved in Microsoft Outlook, it it saved as a '.msg' files which contains all the content of the email along with the aattachment files. I'd like to extract the textual content of the body of the email as well as it's attachments. Does Apache Tika support '.msg' files? If not any other idea?
how to extract content of '.msg' files generated by outlook?
4.7k views Asked by HHH At
2
There are 2 answers
0
On
If you look at the list of mail formats supported by Apache Tika 1.9 (currently the latest version), you'll see that Outlook MSG files are listed as being supported.
Taking a simple example MSG file from the Apache POI project's test files, and using the Tika App standalone jar to make testing easy, we can easily get out the contents and the metadata:
$ java -jar tika-app-1.9.jar --metadata simple_test_msg.msg
Author: Travis Ferguson
Content-Length: 16896
Content-Type: application/vnd.ms-outlook
Creation-Date: 2007-07-06T05:27:17Z
Last-Modified: 2007-07-06T05:27:17Z
Last-Save-Date: 2007-07-06T05:27:17Z
Message-Bcc:
Message-Cc:
Message-From: Travis Ferguson
Message-Recipient-Address: [email protected]
Message-To: [email protected]
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.microsoft.OfficeParser
creator: Travis Ferguson
date: 2007-07-06T05:27:17Z
dc:creator: Travis Ferguson
dc:description: test message
dc:title: test message
dcterms:created: 2007-07-06T05:27:17Z
dcterms:modified: 2007-07-06T05:27:17Z
meta:author: Travis Ferguson
meta:creation-date: 2007-07-06T05:27:17Z
meta:save-date: 2007-07-06T05:27:17Z
modified: 2007-07-06T05:27:17Z
resourceName: simple_test_msg.msg
subject: test message
title: test message
$ java -jar tika-app-1.9.jar --text simple_test_msg.msg
test message
From
Travis Ferguson
To
[email protected]
Recipients
[email protected]
This is a test message.
Metadata, including senders, receipients, dates etc, text, all you could want!
Alternately, if you have special needs/requirements and want full control, you can use the underlying Apache POI HSMF library to parse your MSG files, look at the HSMF unit tests for usage examples
Tika does support msg files
You can use apache POI there are some examples around like this one
sample: