Support MySQL find_in_set Multi-Valued Indexes

75 views Asked by At

Since MySQL version 8.0.17 there is Multi-Valued Indexes for JSON array column, is there also a simple possibility to create Multi-Valued Indexes for comma delimited string column (CHAR, VARCHAR, ...) to increase the performance of FIND_IN_SET function?

I know that there is an option to create a secondary table customers_zip and use the INNER JOIN or IN function, but this is a complicated option. I use 10k databases whit ~150 various tables -> ~1.5M different comma delimited columns.

Example table:

CREATE TABLE `customers` (
 `id` BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 `zip_string` VARCHAR(100) NOT NULL,
 `zip_json` JSON,
 INDEX `zip_json` ((CAST(`zip_json` AS UNSIGNED ARRAY)))
);

Insert 1 000 000 rows:

INSERT INTO `customers` (`id`, `zip_string`, `zip_json`) VALUES 
 (1, '10001,10002,10003', '[10001,10002,10003]'),
 (2, '10004,10005', '[10004,10005]'),
 ...;

Select for FIND_IN_SET load 120ms

SELECT * FROM `customers` WHERE FIND_IN_SET('10005', `zip_string`);

Select for JSON_CONTAINS or MEMBER OF load 1ms

SELECT * FROM `customers` WHERE JSON_CONTAINS(`zip_json`, '10005');
SELECT * FROM `customers` WHERE 10005 MEMBER OF(`zip_json`);
2

There are 2 answers

0
Bill Karwin On

No, you can't make any kind of index to optimize searches for arbitrary elements in a comma-separated list.

The closest options are:

  • Convert the comma-separated list to a JSON array, and make a multi-valued index, however in current versions of MySQL this only works for integers. There is no support for making a multi-valued index of strings.

  • Use a functional index, but this only works for a fixed search criteria. That is, you could add index on the expression (FIND_IN_SET('10005', zip_string)), but the value 10005 is hard-coded in the index definition. It's not useful to find any other value.

  • Use a fulltext index, but this has limitations too.

It really is easier to optimize if you refactor to a normalized table design, storing individual values in their own row. Storing comma-separated lists working against the strengths of a relational database.

0
Liubarskis On

I created a little hack :)

Add a virtual JSON array column zip_index, whose contents will be automatically generated from the zip_string column, and which will be Multi-Valued Indexes

ALTER TABLE `customers` ADD COLUMN `zip_index`
   JSON GENERATED ALWAYS AS (CONCAT('[', `zip_string`, ']'))
      VIRTUAL NOT NULL;
CREATE INDEX `zip_index` ON `customers` ((CAST(`zip_index` AS UNSIGNED ARRAY)));

Select for JSON_CONTAINS load 2ms

SELECT * FROM `customers` WHERE JSON_CONTAINS(`zip_index`, '10005');

The content in column zip_index is automatically generated and indexed when using UPDATE on column zip_string or INSERT a new row (mabey reduced performance?). But it is not necessary to use multiple tables for each similar column (comma delimiter string). Using the JSON_CONTAINS function is still as easy as the FIND_IN_SET function.


Example for indexing non-integer lists

CREATE INDEX `zip_index` ON `customers` ((CAST(`zip_index` AS CHAR(100) ARRAY)));

The content of the original column zip_string must contain strings enclosed in quotes, like "A10001","B10002","C10003"