We are working on a system where a person can send a document to another person, a document can have multiple attachments, as given below.
Document {
Set<Attachment> attachments;
}
If X sends a document (Doc1,Ver1) to Y and Y edits the document then we have to create a new version (Doc1,Ver2) of the document so that X sent box doesn't reflect the changes made by Y.
In our system there will be millions of documents where each document can have hundreds of attachments. Also a document can travel through n number of persons over a period of time.
We should be able to fetch any document version along with the list of attachments it had at that time, so I have to maintain versions of document, so the table structure that came to my mind immediately was the one below.
Document - id primary key, ver_id primary key
Attachment - id, doc_id foreign key, doc_ver_id foreign key
But the problem with this structure is that if X is sending a document to Y with 100 attachment and Y has made some minor modification then I have to create a new version along with copying all the attachments for the new version, most of which are same as the one in previous version, since we will be having millions of documents and each document will move through n number of persons this model will result in a very huge attachment table with lot of redundant data.
So we thought of an alternate structure for attachment, as below.
Document - id primary key, ver_id primary key
Attachment - id, doc_id, attached_ver_id, detached_version_id
But I couldn't create a hibernate entity for Document with this structure, so my question is, is there any other table structure which is better equipped to solve this problem with less redundancy and is it possible to create a hibernate mapping for the above table structure.
Background
When data history needs to be kept there are usually two possible approaches:
Method #1: Cloning
When a new entry is created, its details are copied from the most recent existing entry.
Method #2: Deltas
The details for the very first entry are stored. Each subsequent entry stores the changes from the previous version.
Pros/Cons:
Method #1 is generally simpler and faster as the details for any record can be looked up directly without needing to build them up. But Method #2 uses less storage. (Probably worth noting that in my experience Method #1 has always been preferable because simplicity and speed of retrieval is usually more of an important consideration than storage).
What is being asked?
My understanding is you started off with Method #1 but now prefer Method #2.
Answer
It should be perfectly possible to create entities for this database structure - each entity is of course just the Hibernate class representation of a database table. Would suggest adding a mapping table between Document Version and Attachment:
Here
DocumentAttachmentDelta
is an explicitly defined mapping table with @ManyToOne relationships linking to the primary keys identifying the document version and attachment. It has an additional boolean flag that specifies whether the attachment is being removed or added for this version of the document. So from above, for the first version of the document all its initial attachments would be added but for subsequent versions only the deltas would be stored, which could be additions or removals.Entity details (following a request in the comments)