Archiving Theory

221 views Asked by At

I am about to set up a new database that will need to include archiving of records that are still accessible. Records are all associated with certain projects and when the project is archived, I want the records to stay the same, a snapshot. (e.g. If a contact is associated with an archived project, and they move a year later, I want it to still pull the old address.) The archived records do not need to be updated, but they do need to be accessible.

I have an idea of how to go about this, but I am not sure if this is the best approach: Have a duplicate of each table that would "archive" everything, and then when putting an item in archive, all the FK/PK relationships would be updated, though that seems like a cumbersome process.

Another idea I had was each item (i.e. contact) would be assigned a PK and then there would be a secondary key for each item which would then be associated with each project. The main problem with this is it seems difficult if a contact updates on a live project, a lot of updates would be required.

Please let me know if you have any questions.

Thank you for the help.

2

There are 2 answers

0
Pedro Rolo On

Hm... The only time I came accross with something like this, there was the idea to solve it in the application layer, and not on the db.

For instance, for ruby, you may use vestal_versions or paper-trail.

Paper-trail, for instance, stores the versions of all objects as serialized objects on one single table, and works with deltas.

0
cdeszaq On

What you are looking for is to include temporal data along side your domain data. This sort of a thing isn't a trivial task to undertake and is often the cause of great complication in applications that need to support it.

There are a number of ways to go about doing this, each with their pros and cons, and the way you choose will depend on what you need to do with the temporal element of your data. Some of these include:

  1. Audit trail
    • You track the changes made to a record over time and the primary record reflects it's current state
    • Reduces data duplication to a minimum
    • Likely doesn't fit the "snapshot" model you are looking for easily
  2. Most Current
    • You have a record for each "version" of an entity with a timestamp of when it was created
    • Makes it easy to jump back to a point in time
    • Makes it easy to "fork" an entity
    • Has the most data duplication

Martin Fowler has written some articles relating to designing models that deal with temporal data, so I would start there for a good, solid grounding in the topic.