Clickhouse time query seems to be slow

23 views Asked by At

I am moving all my iot data (currently into several small table orginized by tenants) into a general one with several MV.

The problem I have right now is that a simple query take long time to execute and i am not sure why?

Here is my table

CREATE TABLE IF NOT EXISTS tags  (
  timestamp DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC') Codec(DoubleDelta, LZ4),
  insert_time DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC') Codec(DoubleDelta, LZ4),
  tag_id String,
  numeric Nullable(Float64) Codec(Gorilla, LZ4),
  string Nullable(String),
  boolean Nullable(Bool),
  error Nullable(String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(timestamp)
PRIMARY KEY (tag_id, timestamp)
ORDER BY (tag_id, timestamp)
SETTINGS index_granularity = 8192;

and if i run this type of query

SELECT
  timestamp, tag_id, numeric
FROM tags
WHERE 
  tag_id = 'A_TAG_ID'
 and ( timestamp >= '2024-03-26 00:00:00' AND timestamp <= '2024-03-26 23:59:59' )
ORDER BY timestamp DESC

it takes ~4s

When i use EXPLAIN indexes = 1 to see whats is going on, i am not sure to understand what i am doing wrong?

┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Expression (Projection)                                                                                                            │
│   Sorting (Sorting for ORDER BY)                                                                                                   │
│     Expression (Before ORDER BY)                                                                                                   │
│       ReadFromMergeTree (omnibus.tags)                                                                                             │
│       Indexes:                                                                                                                     │
│         MinMax                                                                                                                     │
│           Keys:                                                                                                                    │
│             timestamp                                                                                                              │
│           Condition: and((timestamp in (-Inf, '1711411200']), (timestamp in ['1710806400', +Inf)))                                 │
│           Parts: 429/3931                                                                                                          │
│           Granules: 623514/5402495                                                                                                 │
│         Partition                                                                                                                  │
│           Keys:                                                                                                                    │
│             toYYYYMMDD(timestamp)                                                                                                  │
│           Condition: and((toYYYYMMDD(timestamp) in (-Inf, 20240326]), (toYYYYMMDD(timestamp) in [20240319, +Inf)))                 │
│           Parts: 429/429                                                                                                           │
│           Granules: 623514/623514                                                                                                  │
│         PrimaryKey                                                                                                                 │
│           Keys:                                                                                                                    │
│             tag_id                                                                                                                 │
│             timestamp                                                                                                              │
│           Condition: and((tag_id in 2-element set), and((timestamp in (-Inf, '1711411200']), (timestamp in ['1710806400', +Inf)))) │
│           Parts: 400/429                                                                                                           │
│           Granules: 401/623514                                                                                                     │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

The size of the table is 219.98 GiB and ~44243338796 rows

this is an exemple of one of my MV

CREATE TABLE IF NOT EXISTS tags_1minute (
  bucket DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC') Codec(DoubleDelta, LZ4),
  tag_id String,
  avg  AggregateFunction(avg, Nullable(Float64)),
  max  AggregateFunction(max, Nullable(Float64)),
  min  AggregateFunction(min, Nullable(Float64)),
  sum  AggregateFunction(sum, Nullable(Float64)),
  median  AggregateFunction(median, Nullable(Float64)),
  cnt  AggregateFunction(count)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(bucket)
PRIMARY KEY (bucket, tag_id)
ORDER BY (bucket, tag_id)
SETTINGS index_granularity = 8192;
CREATE MATERIALIZED VIEW tags_1minute_mv TO tags_1minute AS
SELECT
  toStartOfMinute(timestamp) AS bucket,
  tag_id,
  avgState(numeric) AS avg,
  maxState(numeric) AS max,
  minState(numeric) AS min,
  sumState(numeric) AS sum,
  medianState(numeric) AS median,
  countState(numeric) AS cnt
FROM tags
WHERE isNotNull(numeric)
GROUP BY bucket, tag_id;

I need to mention that one tag is one type (either numeric, bool, string), that is why i use the nullable

You can reproduce with the given table and MV, I believe it is fairly simple to reproduce

0

There are 0 answers