How does Cassandra achieve strong consistency with failed writes if there is no rollback and no two phase commit?

62 views Asked by At

Looking for concrete evidence (documentation or source code) on the behavior of cassandra 3.0+ on the following situation

Write of key1, value1 is requested with consistency level of QUORUM but only N replica responded success where N < QUORUM

What happen to those N nodes that just updated key1? Do they get rollback?

In cassandra documentation https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlTransactionsDiffer.html

if using a write consistency level of QUORUM with a replication factor of 3, Cassandra will replicate the write to all nodes in the cluster and wait for acknowledgement from two nodes. If the write fails on one of the nodes but succeeds on the other, Cassandra reports a failure to replicate the write on that node. However, the replicated write that succeeds on the other node is not automatically rolled back.

It mentions if some write failed and did not satisfied consistency level, coordinator will return failure, but the data will persist on nodes that have write succeeded

But this means strong consistency can never be achieved even if R + W > number of replica as official. documentation suggested

https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlAboutDataConsistency.html

Consider the following situation

replica number = 5
consistency level write = 3
consistency level read = 3

If a write is attempted , but one nodes succeeds , coordinator will return failure, but that one node will not rollback, so you need a consistency level of 5 in order to achieve strong consistentcy

The documentation has conflicting information

What am I getting wrong here?

1

There are 1 answers

0
JayK On

I think there is a fundamental disconnect on what strong consistency refers to. It is not that the data is always on all nodes.

Let me explain.

In Cassandra, to achieve strong consistency you need to look at both reads and writes.

If we consider the example you gave:

replica number = 5 consistency level write = 3 consistency level read = 3

In the normal case, if you read from 3 nodes, and you write to 3 nodes, you are guaranteed to get the correct response even if the remaining 2 nodes do not receive the write. With the caveat that the client request did not fail. (This is what Cassandra calls 'Strong Consistency')

As opposed to 'eventual consistency' which happens for example if you read with 'consistency one' in this scenario instead. You might hit the node that didn't receive the update and get stale replies even if all client requests pass perfectly fine.

The client always checks if sufficient nodes are up for the query before it attempts it. If not enough nodes are available, an error is returned from the client without attempting the query.

For your case to occur, three nodes must go down at the same time after the client has done the check and before the client has had chance to notice the nodes being down.

In this specific case you might receive an error that indicates that the inserts could have been partially written. Under the definition above, it's still 'strong consistency', because the client still knows that something went wrong and can retry the write to fix it.

Also, if you happen to write partially, the next time you read this data, read repair will automatically spread the data unless you hit 3 nodes which do not have the data.

Hope this helps!