Why is my CockroachDB disk usage not decreasing?

257 views Asked by At

I deleted a bunch of data from my CockroachDB database, but the disk usage is not reducing.

1

There are 1 answers

0
Jackson On

1. The data could be preserved for MVCC history.

CockroachDB implements Multi-Version Concurrency Control (MVCC), which means that it maintains a history of all mutations to a row. This history is used for a wide range of functionality: transaction isolation, historical AS OF SYSTEM TIME queries, incremental backups, changefeeds, cluster replication, and so on. The requirement to preserve history means that Cockroach 'soft' deletes data: The data is marked as deleted by a tombstone record so that Cockroach will no longer surface the deleted rows to queries, but the old data is still present on disk.

The length of history preserved by MVCC is determined by two things: the gc.ttlseconds of the zone that contains the data and whether any protected timestamps exist. You can check the range's stats (eg, in the DB Console) to observe the key_bytes, value_bytes and live_bytes. The live_bytes metric reflects data that's not garbage. The value of (key_bytes+value_bytes)-live_bytes will tell you how much MVCC garbage is resident within a range.

When data has been deleted for at least the duration specified by gc.ttlseconds, Cockroach will consider it eligible for 'garbage collection.' Asynchronously, Cockroach will perform garbage collection of ranges that contain significant quantities of garbage and delete the garbage. Note that if there are backups or other processes that haven't completed yet but require the data, these processes may prevent garbage collection of old data until these processes have completed by setting a protected timestamp.

2. The data could be in the process of being compacted.

When MVCC garbage is deleted by garbage collection, the data is still not yet physically removed from the filesystem. Removing data from the filesystem requires rewriting the files containing the data, which can be expensive. The Cockroach storage engine has heuristics to compact data and remove deleted rows when enough garbage has accumulated to warrant a compaction. The storage engine strives to always restrict the overhead of obsolete data (called the space amplification) to at most 10%. If a lot of data was just deleted, it may take the storage engine some time to compact the files and restore this property.

There's some more information on the Cockroach documentation for MVCC.