I'm trying to optimize a simple query that joins between two tables and applies a range condition. From the explain plan below, you can see that the index inv_quantity_on_hand is only used partially (4 bytes, only for the first column - inv_item_sk). I would expect the entire index to be used, as the second part (inv_quantity_on_hand) of the index is used in the WHERE clause in a range condition.
Please note that this happens only with a join and a range condition. Replacing the range condition to a constant equality comparison (inv_quantity_on_hand = 5) will change the explain plan and MySQL will use the entire index.
It seems to be an instance of this bug: https://bugs.mysql.com/bug.php?id=8569.
I checked it with MySQL 5.7 and it still happens. Anyone can think of a good workaround please?
Schema structure:
CREATE TABLE `inventory` (
`inv_date_sk` INT(11) NOT NULL,
`inv_item_sk` INT(11) NOT NULL,
`inv_warehouse_sk` INT(11) NOT NULL,
`inv_quantity_on_hand` INT(11) DEFAULT NULL,
PRIMARY KEY (`inv_date_sk` , `inv_item_sk` , `inv_warehouse_sk`),
KEY `inv_w` (`inv_warehouse_sk`),
KEY `inv_i` (`inv_item_sk`),
KEY `inv_quantity_on_hand_index` (`inv_item_sk` , `inv_quantity_on_hand`),
CONSTRAINT `inv_d` FOREIGN KEY (`inv_date_sk`)
REFERENCES `date_dim` (`d_date_sk`)
ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `inv_i` FOREIGN KEY (`inv_item_sk`)
REFERENCES `item` (`i_item_sk`)
ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `inv_w` FOREIGN KEY (`inv_warehouse_sk`)
REFERENCES `warehouse` (`w_warehouse_sk`)
ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=INNODB DEFAULT CHARSET=UTF8
CREATE TABLE `item` (
`i_item_sk` INT(11) NOT NULL,
`i_item_id` CHAR(16) NOT NULL,
`i_rec_start_date` DATE DEFAULT NULL,
`i_rec_end_date` DATE DEFAULT NULL,
`i_item_desc` VARCHAR(200) DEFAULT NULL,
`i_current_price` DECIMAL(7 , 2 ) DEFAULT NULL,
`i_wholesale_cost` DECIMAL(7 , 2 ) DEFAULT NULL,
`i_brand_id` INT(11) DEFAULT NULL,
`i_brand` CHAR(50) DEFAULT NULL,
`i_class_id` INT(11) DEFAULT NULL,
`i_class` CHAR(50) DEFAULT NULL,
`i_category_id` INT(11) DEFAULT NULL,
`i_category` CHAR(50) DEFAULT NULL,
`i_manufact_id` INT(11) DEFAULT NULL,
`i_manufact` CHAR(50) DEFAULT NULL,
`i_size` CHAR(20) DEFAULT NULL,
`i_formulation` CHAR(20) DEFAULT NULL,
`i_color` CHAR(20) DEFAULT NULL,
`i_units` CHAR(10) DEFAULT NULL,
`i_container` CHAR(10) DEFAULT NULL,
`i_manager_id` INT(11) DEFAULT NULL,
`i_product_name` CHAR(50) DEFAULT NULL,
PRIMARY KEY (`i_item_sk`),
KEY `item_color_index` (`i_color`)
) ENGINE=INNODB DEFAULT CHARSET=UTF8
Query:
SELECT
*
FROM
inventory
INNER JOIN
item ON inventory.inv_item_sk = item.i_item_sk
WHERE
inventory.inv_quantity_on_hand > 100
AND item.i_color = 'red';
Execution plan:
# id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-----+-------------+-----------+------------+------+----------------------------------+----------------------------+---------+----------------------+-----------------+-------------------------
1 | SIMPLE | item | | ref | PRIMARY,item_color_index | item_color_index | 61 | const | 384 | 100.00 |
1 | SIMPLE | inventory | | ref | inv_i,inv_quantity_on_hand_index | inv_quantity_on_hand_index | 4 | tpcds.item.i_item_sk | 615 | 33.33 | Using where; Using index
The multi column indexes are stored as concats of the different columns. I think MySQL will not evaluate a substring within a multi column index for comparison. When you use inv_quantity_on_hand = 5 (or in(1,2,3,4,5)) MySQL will build the strings from your input for comparison so it can use the full index. Using between or > basically provides an unlimited number of possible substrings to compare (before checking data type). Building all those strings and comparing them would take much more time than using the index for the first column (on-clause) and then check the inv_quantity_on_hand "using where".