Does ClickHouse support partitioning like traditional RDBMS do, and if so how can I implement it?

27 views Asked by At

What are the feasibility and implementation methods of partitioning data in ClickHouse?

What are the traditional RDBMS?

In ClickHouse, how feasible is data partitioning, and what implementation methods exist? Additionally, how do traditional Relational Database Management Systems (RDBMS) typically handle data partitioning?

1

There are 1 answers

0
Rich Raposa On

You partition a table in ClickHouse just like you do in your favorite old-school RDBMs - using a PARTITION BY clause.

The difference is in how ClickHouse stores the data on disk. Every time you do an INSERT into a MergeTree table, the rows being inserted go into their own folder called a part. You can get a lot of parts in ClickHouse, so insert your data wisely (either lots of rows at once or using async inserts). You don't want too many parts. (Parts merge in the background, but that's a story for another day.)

When a table is partitioned, only rows from the same partition key can go into the same part. So let's say you partition by a column that has 100,000 unique values. Then you are guaranteed, even on your best day, to have 100,000 parts in your cluster. That's too many...which means your choice of partitioning key was not good.

In general, we have one recommendation for partitioning - especially when you are new to ClickHouse - and that is to only partition by month. All rows from the same month will be stored together, but that means on your best day you might only have 12 parts per year. (That's an extreme simplification...but it makes my point.)