I am moving all my iot data (currently into several small table orginized by tenants) into a general one with several MV.
The problem I have right now is that a simple query take long time to execute and i am not sure why?
Here is my table
CREATE TABLE IF NOT EXISTS tags (
timestamp DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC') Codec(DoubleDelta, LZ4),
insert_time DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC') Codec(DoubleDelta, LZ4),
tag_id String,
numeric Nullable(Float64) Codec(Gorilla, LZ4),
string Nullable(String),
boolean Nullable(Bool),
error Nullable(String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(timestamp)
PRIMARY KEY (tag_id, timestamp)
ORDER BY (tag_id, timestamp)
SETTINGS index_granularity = 8192;
and if i run this type of query
SELECT
timestamp, tag_id, numeric
FROM tags
WHERE
tag_id = 'A_TAG_ID'
and ( timestamp >= '2024-03-26 00:00:00' AND timestamp <= '2024-03-26 23:59:59' )
ORDER BY timestamp DESC
it takes ~4s
When i use EXPLAIN indexes = 1 to see whats is going on, i am not sure to understand what i am doing wrong?
┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Expression (Projection) │
│ Sorting (Sorting for ORDER BY) │
│ Expression (Before ORDER BY) │
│ ReadFromMergeTree (omnibus.tags) │
│ Indexes: │
│ MinMax │
│ Keys: │
│ timestamp │
│ Condition: and((timestamp in (-Inf, '1711411200']), (timestamp in ['1710806400', +Inf))) │
│ Parts: 429/3931 │
│ Granules: 623514/5402495 │
│ Partition │
│ Keys: │
│ toYYYYMMDD(timestamp) │
│ Condition: and((toYYYYMMDD(timestamp) in (-Inf, 20240326]), (toYYYYMMDD(timestamp) in [20240319, +Inf))) │
│ Parts: 429/429 │
│ Granules: 623514/623514 │
│ PrimaryKey │
│ Keys: │
│ tag_id │
│ timestamp │
│ Condition: and((tag_id in 2-element set), and((timestamp in (-Inf, '1711411200']), (timestamp in ['1710806400', +Inf)))) │
│ Parts: 400/429 │
│ Granules: 401/623514 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
The size of the table is 219.98 GiB and ~44243338796 rows
this is an exemple of one of my MV
CREATE TABLE IF NOT EXISTS tags_1minute (
bucket DateTime64(3, 'UTC') DEFAULT now64(3, 'UTC') Codec(DoubleDelta, LZ4),
tag_id String,
avg AggregateFunction(avg, Nullable(Float64)),
max AggregateFunction(max, Nullable(Float64)),
min AggregateFunction(min, Nullable(Float64)),
sum AggregateFunction(sum, Nullable(Float64)),
median AggregateFunction(median, Nullable(Float64)),
cnt AggregateFunction(count)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(bucket)
PRIMARY KEY (bucket, tag_id)
ORDER BY (bucket, tag_id)
SETTINGS index_granularity = 8192;
CREATE MATERIALIZED VIEW tags_1minute_mv TO tags_1minute AS
SELECT
toStartOfMinute(timestamp) AS bucket,
tag_id,
avgState(numeric) AS avg,
maxState(numeric) AS max,
minState(numeric) AS min,
sumState(numeric) AS sum,
medianState(numeric) AS median,
countState(numeric) AS cnt
FROM tags
WHERE isNotNull(numeric)
GROUP BY bucket, tag_id;
I need to mention that one tag is one type (either numeric, bool, string), that is why i use the nullable
You can reproduce with the given table and MV, I believe it is fairly simple to reproduce