I'm working on a mail gateway that would automatically provide (among other things) "view in browser" functionality for all emails that are being sent through it. This raises the need to store all emails somewhere so that they can be easily accessed. Even though time period is limited, and even applying gzip before saving the message, we're looking at ~500GB of storage required to just keep recent messages.
Since all emails are mostly identical (except for a few personal variables), I was thinking if there is a more efficient way to store. Something that deduplicates stuff across multiple records, or something like that. Any suggestions on that?
Alternate way would be to save the template, and save only variables for each email sent, but we don't want to do that, as this process should be transparent to the sender. This means that this information would not be accessible, and it needs to be deduced after the fact.
This should all be done dynamically. Store the email once as it existed before you added your subscriber specific content/merge tags (variables). In the email you would need to have the 'view in browser' link unique to each subscriber. Based on the link you would then serve up their unique variables in the browser based version.
If there is a lot of unique content, you might want to use a database, otherwise if it is just their name for example, you could pass that as a url parameter itself.