I would like to store a large number of JSON documents using a documented-oriented database, all with very similar schema (though not identical).
One example document:
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
}
Do any of the systems (CouchDB etc.) use compression (of any sort) to avoid storing the key strings (e.g. "firstName") over and over again?
My motivation is to minimise the size of the database on disk when there are millions of documents, especially when some of the recurring keys are much longer than e.g. "firstName".
Thanks for your thoughts!
W
Edit: Having thought about this more, what I think I am asking about is a specific case of a more general compression system in which a compression dictionary is (partly?) shared across multiple compressed documents in a document store (and probably built up over time). This would then handle compression of more than just JSON keys.
Would be interesting to do!
I would just add a 'key mapping' document where you store the keys and their shortcuts ... doing the mapping in your backend should not be all that much trouble ...