Space efficient marketing email storage

88 views Asked by At

I'm working on a mail gateway that would automatically provide (among other things) "view in browser" functionality for all emails that are being sent through it. This raises the need to store all emails somewhere so that they can be easily accessed. Even though time period is limited, and even applying gzip before saving the message, we're looking at ~500GB of storage required to just keep recent messages.

Since all emails are mostly identical (except for a few personal variables), I was thinking if there is a more efficient way to store. Something that deduplicates stuff across multiple records, or something like that. Any suggestions on that?

Alternate way would be to save the template, and save only variables for each email sent, but we don't want to do that, as this process should be transparent to the sender. This means that this information would not be accessible, and it needs to be deduced after the fact.

2

There are 2 answers

1
John On

This should all be done dynamically. Store the email once as it existed before you added your subscriber specific content/merge tags (variables). In the email you would need to have the 'view in browser' link unique to each subscriber. Based on the link you would then serve up their unique variables in the browser based version.

If there is a lot of unique content, you might want to use a database, otherwise if it is just their name for example, you could pass that as a url parameter itself.

0
Filip Hanes On
  1. If there are duplicated images/attachments/parts you can implement deduplication of parts based on their content hash.

  2. You could pack multiple messages in TAR or MBOX file format and then compress them before storing. Compression ratio would be better, because of more duplicate bytes in one file. Random email access would be harder depending on how many emails are compressed in 1 file.

  3. Train custom compression dictionary and compress each email independently. Zstd for example: https://facebook.github.io/zstd/#small-data

EDIT: added third solution