I have been reading Nathan Marz' article about how to beat the CAP theorem with the Lambda Architecture and don't understand how immutable data will make eventual consistency less complex.
The following paragraph is taken from the article:
The key is that data is immutable. Immutable data means there's no such thing as an update, so it's impossible for different replicas of a piece of data to become inconsistent. This means there are no divergent values, vector clocks, or read-repair. From the perspective of queries, a piece of data either exists or doesn't exist. There is just data and functions on that data. There's nothing you need to do to enforce eventual consistency, and eventual consistency does not get in the way of reasoning about the system.
Imagine the following example: I have a distributed insert-only database with two nodes A and B and both hold the record [timestamp=1; id=1; value=10]
. Then at the same time, there is an insert against node A which results in [timestamp=2; id=1; value=20]
and a read against node B for record with id=1
.
How is solving the problem of eventual consistency less complex with that example than for databases with update possibility?
I'm not 100% I got it right, but I'll try to explain anyway.
Consider an example - you have 2 databases accepting writes/reads, connected with a network link. The link goes down, resulting in a network partition. We want our system to be CAP available, so we accept writes/reads in both databases.
When working with mutable data structures: suppose a client, connected to the 1st database, wants to update value for record X to A and another client, connected to the 2nd database, wants to update that value to B. Since our system is available, we accept both writes in both databases, but we will have to resolve the conflict once the network parittion is gone. This will result in one of the update being lost.
With immutable data structures, you wouldn't update the data but insert, so both writes would be there after the network paritition is gone. You'd still need some kind of time synchronization though in order to preserve the operation order which can be very tricky (see the comment in the article from Sebastien Diot).