Influnce of partition quantity on repair time in Cassandra cluster

103 views Asked by At

How does quantity of partitions influence on repair time in Cassandra cluster?

Is it correct that the less quantity of partitions the faster speed of Merkle tree algorithm and repair procedure?

Will repair faster for -

CREATE TABLE ks.t1 (
     id2 bigint,
     id1 bigint,
     name text,
     PRIMARY KEY (id2, id1, name)
);

than for

CREATE TABLE ks.t1 (
    id2 bigint,
    id1 bigint,
    name text,
    PRIMARY KEY ((id2, id1), name)
);  

If count(id2, id1) > count (id1) ?

1

There are 1 answers

2
doanduyhai On BEST ANSWER

When triggering repair, Cassandra will

  • read all SSTables locally on disk into memory
  • compute the Merkle Tree
  • exchange the Merkle Tree between different replicas
  • if there is a mismatch, a block of partitions will be sent on the network

Because Merkle tree resolution only allow 32768 leaf nodes. If there are more than 32768 partitions on a single replica, there will be many partitions that hash into the same leaf node. So if a single partition mismatches, we'll need to send all the block of partitions. That's what I call over repair

This issue is solved more or less by sub-range repair where, instead of repairing the whole token range for a table, Cassandra just attempts to repair a portion of the token range. The direct result is the Merkle Tree resolution will be higher since there are less partitions to repair.

So yes, it seems that having less partitions will reduce over repair.

But ....

In your example, less partition == wider partition which is not ideal either.

Why ? Because if there is a single cell mismatch in a wide partition, Cassandra will need to repair the entire partition, which is a waste of resource.

Furthermore, wide partition will make read path slower because the data is likely to span on many SSTables.

Conclusion, I would personally prefer PRIMARY KEY ((id2, id1), name) and use sub-range repair.