Consider two tables like this:

TABLE: current
 -------------------
| id | dept | value |
|----|------|-------|
|   4|    A |    20 |
|   5|    B |    15 |
|   6|    A |    25 |
 -------------------

TABLE: history
 -------------------
| id | dept | value |
|----|------|-------|
|   1|    A |    10 |
|   2|    C |    10 |
|   3|    B |    20 |
 -------------------

These are just simple examples... in the actual system both tables have considerably more columns and considerably more rows (10k+ rows in current and 1M+ rows in history).

A client application is continuously (several times a second) inserting new rows into the current table, and 'moving' older existing rows from current to history (delete/insert within a single transaction).

Without blocking the client in this activity we need to take a consistent sum of values per dept across the two tables.

With transaction isolation level set to REPEATABLE READ we could just do:

SELECT dept, sum(value) FROM current GROUP BY dept;

followed by

SELECT dept, sum(value) FROM history GROUP BY dept;

and add the two sets of results together. BUT each query would block inserts on its respective table.

Changing the isolation level to READ COMMITTED and doing the same two SQLs would avoid blocking inserts, but now there is a risk of entries being double counted if moved from current to history while we are querying (since each SELECT creates its own snapshot).

Here's the question then.... what happens with isolation level READ COMMITTED if I do a UNION:

SELECT dept, sum(value) FROM current GROUP BY dept
UNION ALL
SELECT dept, sum(value) FROM history GROUP BY dept;

Will MySQL generate a consistent snapshot of both tables at the same time (thereby removing the risk of double counting) or will it still take snapshot one table first, then some time later take snapshot of the second?

1

There are 1 answers

0
David B On

I have not yet found any conclusive documentation to answer my question, so I went about trying to prove it instead. Although not proof in the scientific sense, my findings suggest a consistent snapshot is created for all tables in a UNION query.

Here's what I did.

Create the tables

DROP TABLE IF EXISTS `current`;

CREATE TABLE IF NOT EXISTS `current` (
  `id` BIGINT NOT NULL COMMENT 'Unique numerical ID.',
  `dept` BIGINT NOT NULL COMMENT 'Department',
  `value` BIGINT NOT NULL COMMENT 'Value',
  PRIMARY KEY (`id`));


DROP TABLE IF EXISTS `history`;

CREATE TABLE IF NOT EXISTS `history` (
  `id` BIGINT NOT NULL COMMENT 'Unique numerical ID.',
  `dept` BIGINT NOT NULL COMMENT 'Department',
  `value` BIGINT NOT NULL COMMENT 'Value',
  PRIMARY KEY (`id`));

Create a procedure that sets up 10 entries in the current table (id = 0, .. 9), then sits in a tight loop inserting 1 new row into current and 'moving' the oldest row from current to history. Each iteration is performed in a transaction, as a result the current table remains at a steady 10 rows, while the history table grows quickly. At any point in time min(current.id) = max(history.id) + 1

DROP PROCEDURE IF EXISTS `idLoop`;

DELIMITER $$
CREATE PROCEDURE `idLoop`()
BEGIN

DECLARE n bigint;

-- Populate initial 10 rows in current table if not already there
SELECT IFNULL(MAX(id), -1) + 1 INTO n from current;
START TRANSACTION;
WHILE n < 10 DO
  INSERT INTO current VALUES (n, n % 10, n % 1000);
  SET n = n + 1;
END WHILE;
COMMIT;

-- In tight loop, insert new row and 'move' oldest current row to history
WHILE n < 10000000 DO
  START TRANSACTION;
  -- Insert new row to current
  INSERT INTO current values(n,  n % 10, n % 1000);
  -- Move oldest row from current to history
  INSERT INTO history SELECT * FROM current WHERE id = (n - 10);
  DELETE FROM current where id = (n - 10);
  COMMIT;
  SET n = n + 1;
END WHILE;

END$$
DELIMITER ;

Start this procedure running (this call won't return for some time - which is intentional)

call idLoop();

In another session on the same database we can now try out a variation on the UNION ALL query in my original posting.

I have modified it to (a) slow down execution,and (b) return a simple result set (two rows) that indicates whether any entries 'moved' whilst the query was running have been missed or double counted.

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

SELECT 'HST' AS src, MAX(id) AS idx, COUNT(*) AS cnt, SUM(value) FROM history WHERE dept IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
UNION ALL
SELECT 'CRT' AS src, MIN(id) AS idx, COUNT(*) AS cnt, SUM(value) FROM current WHERE dept IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);

The sum(value) and where dept in (...) are just there to add work to the query and slow it down.

The indication of a positive outcome is if the two idx values are adjacent, like this:

+-----+--------+--------+------------+
| src | idx    | cnt    | SUM(value) |
+-----+--------+--------+------------+
| HST | 625874 | 625875 |  312569875 |
| CRT | 625875 |     10 |       8795 |
+-----+--------+--------+------------+
2 rows in set (1.43 sec)

I'd still be happy to hear any authoritative information on this.