Why does adding a tokenbf_v2 index to my Clickhouse table not have any effect

1.6k views Asked by At

I have a large table in Clickhouse with one column called 'Route' which is a comma separated String of id's. It looks e.g. like this: 123421,25245,346263. There can be hundreds of id's in one string.

The table is queried to select Routes that go through certain id's where the order matters, like this:

SELECT * FROM MY_TABLE WHERE Route LIKE '%123421%346263%'

Since the queries can be quite slow, I tried to speed it up by adding a data skipping index on the Route column. The tokenbf_v2 index seemed like made for my purpose, because as I understand it, it is supposed to break up my Route column into tokens and should help speeding up LIKE queries, according to the Clickhouse documentation.

However when I add the index with:

alter table MY_TABLE add INDEX route_index (Route) TYPE tokenbf_v1(256, 5, 0) GRANULARITY 1;

, and make sure it is created by running

OPTIMIZE TABLE MY_TABLE FINAL;

there is no speed up at all.

When I analyse the trace of my query in clickhouse client, it consistently shows:

Index `route_index` has dropped 0 granules.

Clearly my bloomfilter tokenbf index has no effect. Does anybody no why?

By the way, my Route column has type String and the granularity of table MY_TABLE is 128.

2

There are 2 answers

0
Denny Crane On BEST ANSWER

like actually can work

https://github.com/ClickHouse/ClickHouse/issues/15835#issuecomment-706711503

select Count(*) from MY_TABLE where Route like '%,3119550599,%'
Index `route_index` has dropped 781655 / 782032 granules.
1
vladimir On

To guarantee to apply the index to all data need to re-insert them all. I would recommend creating the test table with the required index and partially fill it. To use it as a playground to find more optimal indexes.


Consider to use hasToken-function or others ones allowed to tokenbf_v1-index (see Skipping index: functions support):

SELECT *
FROM (
  SELECT * 
  FROM MY_TABLE 
  WHERE hasToken(Route, '123421') AND hasToken(Route, '346263'))
WHERE Route LIKE '%123421,346263%'