As the data in the Commitlog is flushed to the disk periodically after every 10 seconds by default (controlled by commitlog_sync_period_in_ms
), so if all replicas crash within 10 seconds, will I lose all that data? Does it mean that, theoretically, a Cassandra Cluster can lose data?
Cassandra is configured to lose 10 seconds of data by default?
2.8k views Asked by Aliaksandr Kazlou At
2
If a node crashed right before updating the commit log on disk, then yes, you could lose up to ten seconds of data.
If you keep multiple replicas, by using a replication factor higher than 1 or have multiple data centers, then much of the lost data would be on other nodes, and would be recovered on the crashed node when it was repaired.
Also the commit log may be written in less than ten seconds it the write volume is high enough to hit size limits before the ten seconds.
If you want more durability than this (at the cost of higher latency), then you can change the
commitlog_sync
setting fromperiodic
tobatch
. Inbatch
mode it uses thecommitlog_sync_batch_window_in_ms
setting to control how often batches of writes are written to disk. In batch mode the writes are not acked until written to disk.The ten second default for periodic mode is designed for spinning disks, since they are so slow there is a performance hit if you block acks waiting for commit log writes. For this reason if you use
batch
mode, they recommend a dedicated disk for the commit log so that the write head doesn't need to do any seeks to keep the added latency as low as possible.If you are using SSDs, then you can use more aggressive timing since the latency is greatly reduced compared to a spinning disk.