I am little confused about how cassandra making sure consistency when add a new node to the cluster. I know cassandra will do the range movements and stream the data to new added node. Question is that does cassandra also stream the secondary replica's data to new added node.
For example, we have 4 nodes in the cluster with RF=3 (A,B,C,D) A(x=1, y=2), B(x=1, y=3), C(x=1), D(y=2). Partition key "x" will hold by A,B,C, while partition key "y" will hold by D,A,B. If I add a new node A' between A and B. I think it will stream partition "x" from A. But does it also stream partition "y" from B or D?
If it does stream partition "y", which node will cassandra choose to streaming from? From the official document. It will stream from primary replica which is D. If that's the case, when D has stale data (it is ok before adding new node, as both A and B and latest data, which meets the quorum), after streaming, it is possible to query out stale data from D and A'. Am I right?
You are probably right. Running nodetool repair is recommended before adding a new node so that there is no inconsistency in the cluster.