A really basic table containing quotes on instruments across multiple exchanges leveraging the TokuDB storage engine:
CREATE TABLE `quotes` (
`ticker` char(4) NOT NULL,
`timestamp` time(3) NOT NULL,
`price` decimal(7,2) unsigned NOT NULL,
`size` smallint(5) unsigned NOT NULL,
`exchange` char(3) NOT NULL,
KEY `best_price` (`ticker`,`timestamp`,`exchange`,`price`),
KEY `best_size` (`exchange`,`ticker`,`price`,`timestamp`)
) ENGINE=TokuDB
Whenever I query for best price across all exchanges it always uses a temporary table. The presence of exchange
and price
in the index only appears to produce an index scan being equivalent to a clustered key on (ticker, timestamp)
in TokuDB.
EXPLAIN SELECT max(price),exchange
FROM quotes
WHERE
ticker="A" AND
timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: quotes
type: range
possible_keys: best_price
key: best_price
key_len: 9
ref: NULL
rows: 2690
Extra: Using where; Using index; Using temporary
Is it possible to define a configuration that would not use a temporary table? This appears only possible when dropping the timestamp
article:
EXPLAIN SELECT max(price),exchange
FROM quotes
WHERE
ticker="A"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: quotes
type: range
possible_keys: best_price
key: best_size
key_len: 7
ref: NULL
rows: 96
Extra: Using where; Using index for group-by
Example output from best price query:
+------------+----------+
| max(price) | exchange |
+------------+----------+
| 41.06 | BTY |
| 41.06 | DEA |
| 41.07 | NYS |
| 41.07 | THM |
| 41.06 | PSE |
| 41.07 | BAT |
| 41.06 | DEX |
| 41.06 | BOS |
| 41.06 | ADC |
| 41.06 | XPH |
+------------+----------+
10 rows in set (0.01 sec)
And the bottleneck (3ms) is processing each individual row in the time range:
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000071 |
| checking permissions | 0.000005 |
| Opening tables | 0.000016 |
| After opening tables | 0.000006 |
| System lock | 0.000014 |
| Table lock | 0.000002 |
| After table lock | 0.000005 |
| init | 0.000038 |
| optimizing | 0.000024 |
| statistics | 0.000155 |
| preparing | 0.000028 |
| executing | 0.000003 |
| Copying to tmp table | 0.000031 |
| Copying to tmp table | 0.003381 |
| Sending data | 0.000017 |
| end | 0.000004 |
| removing tmp table | 0.000020 |
| end | 0.000002 |
| query end | 0.000005 |
| closing tables | 0.000005 |
| freeing items | 0.000006 |
| updating status | 0.000011 |
| cleaning up | 0.000002 |
+----------------------+----------+
The time range contains 2316 rows, broken down by exchange:
+----------+----------+
| exchange | count(*) |
+----------+----------+
| ADC | 71 |
| BAT | 298 |
| BOS | 129 |
| BTY | 266 |
| DEA | 153 |
| DEX | 60 |
| NYS | 530 |
| PSE | 325 |
| THM | 453 |
| XPH | 31 |
+----------+----------+
I've tried the crazy and added every permutation of covering index and MariaDB cannot find a better key. Are there other databases I should be looking at instead?
Example data set for time range and ticker symbol: http://pastebin.com/b5RcTXAs
The answer is all about optimisation. MySQL has chosen to use a temporary table because it is considered more sensible than using the indexes to fetch each exchange. If one uses a clustered index across
exchange,ticker,timestamp
then the query can operate without a temporary:Compare with a temporary table grouping:
What is interesting to note here is that the first query scans more rows but is executed faster than the second.