Is it possible for a vector clock to be greater than another but they are not ancestor related?

270 views Asked by At

very new to distributed system, just start reading the dynamo paper 4.4 Data Versioning, so my understanding could be off.

Taking the example in the doc, the last step is to reconcile D3 and D4 to D5, but what if the user only retrieved D3 (IIUC this is possible, since there are multiple version of the data lingering in the system, I am not guaranteed to retrieve both D3 and D4, I can even read D1/D2), and without knowing the presence of D4, user updated D3 to D5 and by coincidence, the D5 is also processed by Sz(which processed D4). Doesn't it make D5 looks like a descendant of D4?:

D4([Sx,2],[Sz,1]) vs D5([Sx,2],[Sy,1],[Sz,1])

but this is wrong, either the step I described is not possible or the vector clock has some order requirement?

I see vector clock to carry less information than a linear history log, ie ([Sx,2],[Sy,1]) doesn't tell me the order of the action which took place, it could be expand to different time series like [Sx,Sy,Sx] or [Sy,Sx,Sx] or [Sx, Sx, Sy]?

enter image description here

2

There are 2 answers

3
Mike Dinescu On

Vector clocks by themselves are not enough to prevent the type of data inconsistency you describe.

You need something more: a consensus algorithm. Multiple such well studied protocols exist and they all are all designed to ensure that your data doesn’t end up in a state like you described.

For example, with a quorum protocol you could mandate that a majority of the nodes in the system must “observe” an update in order to be valid. And unless that happens, the write is considered invalid and rejected. Coupled with a vector clock you end up with a distributed system that is resilient to the type of problem you showed

0
Prabhath On

If the D5' is handled by Sz, then it already knows about D4, D5 will be ([Sx2,Sz2]) instead of D5([Sx,2],[Sy,1],[Sz,1]).

If D5" is handled by Sy then, it knows about D3, and D5 will be ([Sx2,Sy2])

If both of them happen then they are both causally independent and cannot be reconciled later.