Storage and efficient lookup for range data

62 views Asked by At

Problem Statement: We will be getting a request of a number (11 digits) and have to efficiently lookup in the database and return a single row (last updated) based on range in which it fits.

Current DB Structure::

Using MySQL

Currently, We have a table which has 2 columns i.e low_range and high_range which stores the range of data and there are 2 other columns which stores corresponding data i.e. is_active (values can 0 and 1) and code (int value which is id of another table i.e code_mapping).

Table 1 name: range_mapping

DB schema:

create table range_mapping (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`low_range` decimal(11,0) NOT NULL,
`high_range` decimal(11,0) NOT NULL,
`is_active` tinyint(1) NOT NULL DEFAULT 1,
`code` int(8) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_comp_is_active_low_high_range` (`is_active`, `low_range`, `high_range`)
) ENGINE=InnoDB AUTO_INCREMENT=26891234 DEFAULT CHARSET=utf8

Table 2 name: code_mapping

DB schema:

create table code_mapping (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `nameIdx` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=4410 DEFAULT CHARSET=utf8

Query to be optimized: I need to redesign or optimize the query for efficient and faster execution.

Request: 12345678912

Possible rows in table:

low_range: 12345678901 high_range: 12345678913
low_range: 12345678910 high_range: 12345678912
low_range: 12345678902 high_range: 12345678920
select a.low_range, a.high_range, b.name from range_mapping AS a 
LEFT JOIN code_mapping AS b ON a.code = b.id 
WHERE a.is_active = 1 and 12345678912 BETWEEN a.low_range AND a.high_range 
ORDER BY a.id DESC 
limit 1;

Issue:: When executing above query with 20 requests/sec, it is taking up to ~20 sec. I need to optimize the query or database to make it execute within 500 ms.

I already added the composite index which somehow optimized the query and also used force index as part of query. It is still taking more than 2 sec.

Explain query:

explain select a.low_range, a.high_range, b.name from range_mapping AS a force index(idx_comp_is_active_low_high_range) LEFT JOIN code_mapping AS b ON a.code = b.id WHERE a.is_active = 1 and 12345678912 BETWEEN a.low_range AND a.high_range ORDER BY a.id DESC limit 1; 

Output:

id: 1
select_type: SIMPLE
table: a
type: range
possible_keys: idx_comp_is_active_low_high_range
key: idx_comp_is_active_low_high_range
key_len: 11
ref: NULL
rows: 227190
Extra: Using index condition; Using filesort

***************************
id: 1
select_type: SIMPLE
table: b
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: testbackup.a.code
rows: 1
Extra: Using where

Expectation: How can I improve the database schema or optimize the query to fetch the data within ms.

1

There are 1 answers

4
O. Jones On

You are seeking the matching row for the largest value of a.id (from ORDER BY a.id DESC LIMIT 1) that matches your query filters. That means you can rewrite your query like this.

Let's start with a subquery.

SELECT MAX(id) id 
  FROM range_mapping 
 WHERE is_active = 1
   and low_range <= 12345678912
   AND high_range >= 12345678912

This subquery does the hard work. It has to look for a lot of stuff in your table, so simplifying it is good.

It can be accelerated with this compound index.

CREATE INDEX idx_comp_is_active_low_id_high_range ON range_mapping
    (is_active, low_range, id DESC, high_range)

To satisfy the subquery MySQL will random-access this index to the first eligible row according to the first two columns (is_active and low_range). Then it will scan through the index looking for the first index row matching your high_range criterion, and return the id value from that index entry. It's already the largest one.

Notice that the index I defined is almost the same as the one you have, except for the addition of id as the second-to-last column.

Next, we need to use that range_mapping.id value to retrieve the details from the first table and the name from the second table. That goes like this.

SELECT a.low_range, a.high_range, b.name
  FROM (
        SELECT MAX(id) id 
          FROM range_mapping 
         WHERE is_active = 1
           AND low_range <= 12345678912
           AND high_range >= 12345678912
       ) AS found
  JOIN range_mapping AS a ON found.id = a.id
  LEFT JOIN code_mapping AS b ON a.code = b.id

This should be faster. It doesn't have to sort anything, nor does it have to join more than one row.

You can write it with BETWEEN the same way.

SELECT a.low_range, a.high_range, b.name
  FROM (
        SELECT MAX(id) id 
          FROM range_mapping 
         WHERE is_active = 1
           AND 12345678912 BETWEEN low_range AND high_range
       ) AS found
  JOIN range_mapping AS a ON found.id = a.id
  LEFT JOIN code_mapping AS b ON a.code = b.id