Consistency effects in distributed (NoSQL) databases

972 views Asked by At

Whenever I read something about NoSQL distributed databases they mention the CAP theorem and that it means that in a partitioned system you can either have full consistency, full availability, or a little bit of both, but never both entirely.

What is not really clear to me is what type of consistency they are talking about:

  1. Is it consistency in data freshness, where some clients may get older data than others?
  2. Or is it consistency in the sense that transactions may complete only partially and this may bring the data in an inconsistent state?

The second interpretation sounds quite dangerous to me and not really acceptable. The first interpretation sounds acceptable but how can you prevent that a client that requests a set of data is not served with partly outdated data and partly fresh data?

How dangerous is it to only offer partial consistency and what are the possible negative effects?

1

There are 1 answers

3
simon at rcl On BEST ANSWER

Consistency in distributed databases is a huge problem, and it means both of your options: stale data in some places, and partially completed transactions. I'm not going to write an essay about it because it is a huge problem and the solutions are not easy. However, here are some key phrases.

Eventual Consistency is the solution to this, but implementing it sounds like a big job. The key to the implementation is Idempotent Messages. Lets say a complete transaction involves updating data on machines A, B, and C. How do you actually do that? You start sending messages around the place, and keep sending them until you receive an acknowledgement of receipt and successful processing. You may send the message to B twice either because B never got the message, or because B's ack never got received. If you sent it twice because you never got the ack, then B had better do the right thing when it gets it again (which may be to ignore it), and send you an ack so you stop bothering it.

This is a pretty good article, it looks like, and its from a NoSQL point of view. There are loads of links about Idempotent Messages hidden in any search engine, so I'll let you root around.

Final note: Pat Helland who worked on Distributed Databases for many years (at Microsoft and Google among other places) eventually came to the conclusion that consistency for Distributed DBs was impossible, and that you'd better settle for Eventual Consistency via Idempotent Messages.