I have a table with column type Array(String)
CREATE TABLE db.logs
( `timestamp` DateTime CODEC(Delta(4),
ZSTD(1)),
`message` String,
`source_type` LowCardinality(String),
`labels_key` Array(String),
`labels_value` Array(String),
INDEX lk_inv_idx labels_key TYPE inverted GRANULARITY 1,
INDEX lv_inv_idx labels_value TYPE inverted GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY timestamp
PRIMARY KEY (timestamp)
ORDER BY
(timestamp)
SETTINGS index_granularity = 8192;
array columns:
['color','make','year','type', 'interior'] ['red','toyota','2020','crossover', 'black']
['color','make','year','type', 'interior'] ['white','dodge','2023','pickup', 'black']
['color','make','year','type', 'interior'] ['red','audi','2021','sedan', 'red']
['color','make','year','type', 'interior'] ['yellow','bmw','2020','hatchback', 'yellow']
I want to search array columns using inverted index but when I search them using has() functions, it doesn't use the index and I can't use hasToken() (This is what I am using to search String columns with inverted array). Any ideas on how I can search using inverted index on Array(String) column. Thanks in advance.
According to https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#functions-support
has
supported by inverted indexbut you need to understand how indexes works in ClickHouse this is not indexes for search, when you firstly search in index and secondary search in main table space
this is DATA SKIP indexes, you skip granules (how many depends on GRANULARITY) inside data parts (system.parts) during filtering, so if your data spread by all granules, then secondary indexes are useless.
Look https://fiddle.clickhouse.com/df6acefe-5511-41e8-bd32-c3d1815d16d1
In output
Index
lk_inv_idxhas dropped 0/2 granules.
means, secondary index was used, but all granules containscolor
, so the index was useless.