MySQL doesn't use entire index when combining join and range

348 views Asked by At

I'm trying to optimize a simple query that joins between two tables and applies a range condition. From the explain plan below, you can see that the index inv_quantity_on_hand is only used partially (4 bytes, only for the first column - inv_item_sk). I would expect the entire index to be used, as the second part (inv_quantity_on_hand) of the index is used in the WHERE clause in a range condition.

Please note that this happens only with a join and a range condition. Replacing the range condition to a constant equality comparison (inv_quantity_on_hand = 5) will change the explain plan and MySQL will use the entire index.

It seems to be an instance of this bug: https://bugs.mysql.com/bug.php?id=8569.

I checked it with MySQL 5.7 and it still happens. Anyone can think of a good workaround please?

Schema structure:

CREATE TABLE `inventory` (
    `inv_date_sk` INT(11) NOT NULL,
    `inv_item_sk` INT(11) NOT NULL,
    `inv_warehouse_sk` INT(11) NOT NULL,
    `inv_quantity_on_hand` INT(11) DEFAULT NULL,
    PRIMARY KEY (`inv_date_sk` , `inv_item_sk` , `inv_warehouse_sk`),
    KEY `inv_w` (`inv_warehouse_sk`),
    KEY `inv_i` (`inv_item_sk`),
    KEY `inv_quantity_on_hand_index` (`inv_item_sk` , `inv_quantity_on_hand`),
    CONSTRAINT `inv_d` FOREIGN KEY (`inv_date_sk`)
        REFERENCES `date_dim` (`d_date_sk`)
        ON DELETE NO ACTION ON UPDATE NO ACTION,
    CONSTRAINT `inv_i` FOREIGN KEY (`inv_item_sk`)
        REFERENCES `item` (`i_item_sk`)
        ON DELETE NO ACTION ON UPDATE NO ACTION,
    CONSTRAINT `inv_w` FOREIGN KEY (`inv_warehouse_sk`)
        REFERENCES `warehouse` (`w_warehouse_sk`)
        ON DELETE NO ACTION ON UPDATE NO ACTION
)  ENGINE=INNODB DEFAULT CHARSET=UTF8

CREATE TABLE `item` (
    `i_item_sk` INT(11) NOT NULL,
    `i_item_id` CHAR(16) NOT NULL,
    `i_rec_start_date` DATE DEFAULT NULL,
    `i_rec_end_date` DATE DEFAULT NULL,
    `i_item_desc` VARCHAR(200) DEFAULT NULL,
    `i_current_price` DECIMAL(7 , 2 ) DEFAULT NULL,
    `i_wholesale_cost` DECIMAL(7 , 2 ) DEFAULT NULL,
    `i_brand_id` INT(11) DEFAULT NULL,
    `i_brand` CHAR(50) DEFAULT NULL,
    `i_class_id` INT(11) DEFAULT NULL,
    `i_class` CHAR(50) DEFAULT NULL,
    `i_category_id` INT(11) DEFAULT NULL,
    `i_category` CHAR(50) DEFAULT NULL,
    `i_manufact_id` INT(11) DEFAULT NULL,
    `i_manufact` CHAR(50) DEFAULT NULL,
    `i_size` CHAR(20) DEFAULT NULL,
    `i_formulation` CHAR(20) DEFAULT NULL,
    `i_color` CHAR(20) DEFAULT NULL,
    `i_units` CHAR(10) DEFAULT NULL,
    `i_container` CHAR(10) DEFAULT NULL,
    `i_manager_id` INT(11) DEFAULT NULL,
    `i_product_name` CHAR(50) DEFAULT NULL,
    PRIMARY KEY (`i_item_sk`),
    KEY `item_color_index` (`i_color`)
)  ENGINE=INNODB DEFAULT CHARSET=UTF8

Query:

SELECT 
    *
FROM
    inventory
        INNER JOIN
    item ON inventory.inv_item_sk = item.i_item_sk
WHERE
    inventory.inv_quantity_on_hand > 100
        AND item.i_color = 'red';

Execution plan:

# id | select_type | table     | partitions | type | possible_keys                    | key                        | key_len | ref                  | rows | filtered |  Extra
-----+-------------+-----------+------------+------+----------------------------------+----------------------------+---------+----------------------+-----------------+-------------------------
1    | SIMPLE      | item      |            | ref  | PRIMARY,item_color_index         | item_color_index           | 61      | const                | 384  | 100.00   |  
1    | SIMPLE      | inventory |            | ref  | inv_i,inv_quantity_on_hand_index | inv_quantity_on_hand_index | 4       | tpcds.item.i_item_sk | 615  |  33.33   | Using where; Using index
2

There are 2 answers

4
iLikeMySql On

The multi column indexes are stored as concats of the different columns. I think MySQL will not evaluate a substring within a multi column index for comparison. When you use inv_quantity_on_hand = 5 (or in(1,2,3,4,5)) MySQL will build the strings from your input for comparison so it can use the full index. Using between or > basically provides an unlimited number of possible substrings to compare (before checking data type). Building all those strings and comparing them would take much more time than using the index for the first column (on-clause) and then check the inv_quantity_on_hand "using where".

1
Lalabhai Patel On

use BETWEEN condition instead of conditional operator