I am having a PST files which contains the email history of a user. The task is to read this PST file and reconstruct the email history to display it in a client. This includes the correctly displaying of conversations as you know it from Email clients:
Meeting at 8:00 07:34 am
AW: Meeting at 8:00 09:12 am
AW: AW: Meeting at 8:00 13:45 pm
[Jenkins Build] Success 11:54 am
[Jenkins Build] Failed 12:13 pm
[Jenkins Build] Success 01:12 pm
[Jenkins Build] Success 10:34 am
[Jenkins Build] Failed 12:12 pm
[Jenkins Build] Success 05:12 pm
However, I don't know how I could do this reliably.
I am using java-libpst (see Official Documentation) which provides a PSTMessage object. There is a method getConversationId() but that appears to be just a string of the original subject of that message which means that there might be duplicates (e.g. [Jenkins Build]*).
So, I am not sure how Outlook is able to reconstruct conversations and whether this is trivial but if there is actually a simple method to do this which I am just overlooking I'd be happy if somebody would let me know - otherwise this will end up in me parsing a ton of subject fields, parsing them and trying to match emails by their subject with the danger of missing different conversations which just have the same subject coincidentally.
I think you will need to construct the conversations yourself. You might find the source code referenced on this page about the Netscape Mail message threading algorithm helpful.
I copied the source code to Github. Here's the email
Threader.javafile.Here is someone offering an explanation of how Gmail constructs conversations My gist is:
in-reply-toemail field can create participants to an email conversation even if they weren't an explicit participant.Where:
equivalent subjectmeans either an identical subject, or a subject that would result replying or forwarding. I.e. "FW: X", "RE: X", "Fwd: X", etc.explicit participants in an email: the sender or any email appearing in a TO: or CC: field. (Maybe a BCC: field too...)participants in an email: explicit participants in an email or anyone who has sent a later email using thein-reply-tofield.participants in any previous email: the distinct emails that are participants in email with an earlier send date having equivalent subject to a current email.Here's another exposition of email fields relevant to email threading. What I took from this is that the
Referencesheader should also be consulted in addition to thein-reply-toheader, and that it is more reliable. (Maybe, if present, it should supercede thein-reply-toheader.