We have a three node Cassandra cluster with RF 3. There is a table with SizeTieredCompaction strategy. In some cases, performing a major compaction nodetool compact --split-output -- <keyspace> <table>
on this table, doesn't free up disk, but performing nodetool garbagecollect -- <keyspace> <table>
frees up the disk. The gc_grace_seconds is set to 1 hour and default_time_to_live is set to 3 hours:
CREATE TABLE keyspace.table (
id text PRIMARY KEY,
json text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 10800
AND gc_grace_seconds = 3600
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Does anyone know the reason?
Thanks in advance!
Nodetool garbagecollect performs single sstable compaction, so it can shrink the size of individual files on disk. Garbagecollect has been available available since Cassandra 2.10 and removes deleted partitions and rows by default. If you specify -g cell it will also remove overwritten or deleted cells.
Nodetool compact Compaction combines several (typically four) smaller sstables together while also cleaning-up overwritten and expired data. Size-tiered compaction requires min_threshold tables to combine.
Compaction may also look at an estimate of number of droppable tombstones in a sstable, and compact a single table if the ratio is above the tombstone_threshold (0.2 or 20% by default)
The documentation on compact states:
DSE 6.7 nodetool compact
Thus garbagecollect will always run, but compact will ignore a table if min_threshold (default 4) isn't satisfied and the droppable tombstone ratio is not very high. Also, garbagecollect requires less free disk space to run.