GROUP BY WHERE range AND const ref without temporary

Question

GROUP BY WHERE range AND const ref without temporary

404 views Asked by Steve-o At 25 December 2014 at 03:56

A really basic table containing quotes on instruments across multiple exchanges leveraging the TokuDB storage engine:

CREATE TABLE `quotes` (
  `ticker` char(4) NOT NULL,
  `timestamp` time(3) NOT NULL,
  `price` decimal(7,2) unsigned NOT NULL,
  `size` smallint(5) unsigned NOT NULL,
  `exchange` char(3) NOT NULL,
  KEY `best_price` (`ticker`,`timestamp`,`exchange`,`price`),
  KEY `best_size` (`exchange`,`ticker`,`price`,`timestamp`)
) ENGINE=TokuDB

Whenever I query for best price across all exchanges it always uses a temporary table. The presence of exchange and price in the index only appears to produce an index scan being equivalent to a clustered key on (ticker, timestamp) in TokuDB.

EXPLAIN SELECT max(price),exchange
FROM quotes
WHERE
  ticker="A" AND
  timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: quotes
         type: range
possible_keys: best_price
          key: best_price
      key_len: 9
          ref: NULL
         rows: 2690
        Extra: Using where; Using index; Using temporary

Is it possible to define a configuration that would not use a temporary table? This appears only possible when dropping the timestamp article:

EXPLAIN SELECT max(price),exchange
FROM quotes
WHERE
  ticker="A"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: quotes
         type: range
possible_keys: best_price
          key: best_size
      key_len: 7
          ref: NULL
         rows: 96
        Extra: Using where; Using index for group-by

Example output from best price query:

+------------+----------+
| max(price) | exchange |
+------------+----------+
|      41.06 | BTY      |
|      41.06 | DEA      |
|      41.07 | NYS      |
|      41.07 | THM      |
|      41.06 | PSE      |
|      41.07 | BAT      |
|      41.06 | DEX      |
|      41.06 | BOS      |
|      41.06 | ADC      |
|      41.06 | XPH      |
+------------+----------+
10 rows in set (0.01 sec)

And the bottleneck (3ms) is processing each individual row in the time range:

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000071 |
| checking permissions | 0.000005 |
| Opening tables       | 0.000016 |
| After opening tables | 0.000006 |
| System lock          | 0.000014 |
| Table lock           | 0.000002 |
| After table lock     | 0.000005 |
| init                 | 0.000038 |
| optimizing           | 0.000024 |
| statistics           | 0.000155 |
| preparing            | 0.000028 |
| executing            | 0.000003 |
| Copying to tmp table | 0.000031 |
| Copying to tmp table | 0.003381 |
| Sending data         | 0.000017 |
| end                  | 0.000004 |
| removing tmp table   | 0.000020 |
| end                  | 0.000002 |
| query end            | 0.000005 |
| closing tables       | 0.000005 |
| freeing items        | 0.000006 |
| updating status      | 0.000011 |
| cleaning up          | 0.000002 |
+----------------------+----------+

The time range contains 2316 rows, broken down by exchange:

+----------+----------+
| exchange | count(*) |
+----------+----------+
| ADC      |       71 |
| BAT      |      298 |
| BOS      |      129 |
| BTY      |      266 |
| DEA      |      153 |
| DEX      |       60 |
| NYS      |      530 |
| PSE      |      325 |
| THM      |      453 |
| XPH      |       31 |
+----------+----------+

I've tried the crazy and added every permutation of covering index and MariaDB cannot find a better key. Are there other databases I should be looking at instead?

Example data set for time range and ticker symbol: http://pastebin.com/b5RcTXAs

Original Q&A

There are 1 answers

**Steve-o** · Accepted Answer · 2014-12-28T04:30:27+00:00

The answer is all about optimisation. MySQL has chosen to use a temporary table because it is considered more sensible than using the indexes to fetch each exchange. If one uses a clustered index across exchange,ticker,timestamp then the query can operate without a temporary:

MariaDB [trth]> explain SELECT min(ask_price),exchange
FROM quotes
USE INDEX (exchange_ticker_timestamp)
WHERE exchange IN ("NYS","BOS","CIN","ADC","DEX","DEA","MID","PSE","THM","WCB","BAT","XPH","BTY") AND 
   ticker="A" AND
   timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: quotes
         type: range
possible_keys: exchange_ticker_timestamp
          key: exchange_ticker_timestamp
      key_len: 10
          ref: NULL
         rows: 2589
        Extra: Using where; Using index

MariaDB [trth]> show profile; 
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000079 |
| checking permissions | 0.000006 |
| Opening tables       | 0.000014 |
| After opening tables | 0.000011 |
| System lock          | 0.000014 |
| Table lock           | 0.000003 |
| After table lock     | 0.000005 |
| init                 | 0.000043 |
| optimizing           | 0.000019 |
| statistics           | 0.000234 |
| preparing            | 0.000027 |
| executing            | 0.000008 |
| Sorting result       | 0.000002 |
| Sending data         | 0.002985 |
| end                  | 0.000006 |
| query end            | 0.000010 |
| closing tables       | 0.000006 |
| freeing items        | 0.000007 |
| updating status      | 0.000138 |
| cleaning up          | 0.000004 |
+----------------------+----------+

Compare with a temporary table grouping:

MariaDB [trth]> explain SELECT min(ask_price),exchange
FROM quotes
WHERE ticker="A" AND
   timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: quotes
         type: range
possible_keys: ticker_timestamp
          key: ticker_timestamp
      key_len: 9
          ref: NULL
         rows: 1515
        Extra: Using where; Using temporary

MariaDB [trth]> show profile;  
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000091 |
| checking permissions | 0.000009 |
| Opening tables       | 0.000037 |
| After opening tables | 0.000009 |
| System lock          | 0.000052 |
| Table lock           | 0.000004 |
| After table lock     | 0.000009 |
| init                 | 0.000049 |
| optimizing           | 0.000025 |
| statistics           | 0.000144 |
| preparing            | 0.000039 |
| executing            | 0.000003 |
| Copying to tmp table | 0.000040 |
| Copying to tmp table | 0.004674 |
| Sending data         | 0.000020 |
| end                  | 0.000003 |
| removing tmp table   | 0.000015 |
| end                  | 0.000003 |
| query end            | 0.000004 |
| closing tables       | 0.000006 |
| freeing items        | 0.000006 |
| updating status      | 0.000204 |
| cleaning up          | 0.000004 |
+----------------------+----------+

What is interesting to note here is that the first query scans more rows but is executed faster than the second.

TechQA.

GROUP BY WHERE range AND const ref without temporary

There are 1 answers

Related Questions in MYSQL

Related Questions in MARIADB

Related Questions in TOKUDB

Popular Questions

Popular Tags

Trending Questions