I'm trying to work out how best to connect / thread a chain of emails. This seems like such a common problem that I was surprised that I couldn't easily locate information on how other people have dealt with it. The only thing I found was a post about JWZ threading which looked more concerned with parsing together a thread in one email. I was wondering if anyone could point to me some current solutions.
I'm using the thoughtbot griddler gem to process incoming emails into a model Message(s)
and a separate model Contact(s)
, and I have a third model for storing replies, e.g. Reply
.
My current thinking is to thread them by the unique contact and the subject line. But then again the subject line will change slightly. e.g. from "This subject" -> "Re: re: This subject" I could use regex to try parsing out "re:"s or I could use something like amatch to do string comparisons?
But then again, what to do about the same subject appearing for the same user 2 months later? Also add some logic regarding the current date so that threads only use recent emails. Then there might be something else useful stored in the email header itself?
- User (by unique email address)
- Unique Subject line (regex re: processing issues?)
- Current date (emails must be date relative to each other)
- Some other clues to look for in the email header?
I have i rough idea of how to do it, I'm just curious to see some current implementations, I just can't seem to find any.
Any pointers would be greatly appreciated!
Email threads are a linked list, the information in the headers contains enough information to reconstruct the list from its component parts.
Introspect the email headers and to look for some specific headers.
The key ones you'll use are
Message-ID
,In-Reply-To
andReferences
. These headers give you information about which message was replied to and what other ids matter to the email thread itself.The easiest way to find information about the headers of an email is to open the 'Original Message' in gmail (from the more menu).