DB candidate as CouchDB/Schema replacement

140 views Asked by At

The idea is to redesign data structure and/or change DB. I just started to review this project and plan to start optimization from this one.

Currently i have CouchDb with about 80GB of document data, around 30M records. From that subset for the most of documents properties like id, group_id, location, type can be considered as generic, but unfortunately for now such are even stored with different property naming around the set. Also a lot of deeply nested can be found.

Structure isn't hardly defined, that's why NoSQL db was selected way before some picture was seen.

Data is calculated and populated in DB in a separate Job on powerful cluster. This isn't done too often. From that perspective i can conclude that general write/update performance isn't very important. Also size decrease would be great, but isn't most important. There are only like 1-10 active customers at a time. Actually read performance with various filtering/grouping etc is most important. But no heavy summary calculations should be done, this one is already done while population.

This one is a data analytical tool for displaying compare and other reports to quality engineers and data analyst, so they can browse the results, group them or filter from the Web UI.

Now such tasks like searching a subset of document properties for a text isn't possible due to performance.

For sure i've done some initial investigations(like http://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf) and it looks Cassandra seems to be good choice among NoSql.

Also it's quite interesting trying to port this data into the new PostgreSQl.

Any ideas would be highly appreciated :-)

1

There are 1 answers

0
Angel Paraskov On

Hello please check the following articles:

http://www.enterprisedb.com/nosql-for-enterprise

For me, PostgreSQL json(and jsonb!) capabilities allow to start schema-less, have transactions, indexes, grouping, aggregate functions with very good performance, just from the start. And when ready(and if needed), you can go for the schema, with internal data migration.

Also check: https://www.compose.io/articles/is-postgresql-your-next-json-database/

Good luck