Consider two tables like this:
TABLE: current
-------------------
| id | dept | value |
|----|------|-------|
| 4| A | 20 |
| 5| B | 15 |
| 6| A | 25 |
-------------------
TABLE: history
-------------------
| id | dept | value |
|----|------|-------|
| 1| A | 10 |
| 2| C | 10 |
| 3| B | 20 |
-------------------
These are just simple examples... in the actual system both tables have considerably more columns and considerably more rows (10k+ rows in current and 1M+ rows in history).
A client application is continuously (several times a second) inserting new rows into the current table, and 'moving' older existing rows from current to history (delete/insert within a single transaction).
Without blocking the client in this activity we need to take a consistent sum of values per dept across the two tables.
With transaction isolation level set to REPEATABLE READ we could just do:
SELECT dept, sum(value) FROM current GROUP BY dept;
followed by
SELECT dept, sum(value) FROM history GROUP BY dept;
and add the two sets of results together. BUT each query would block inserts on its respective table.
Changing the isolation level to READ COMMITTED and doing the same two SQLs would avoid blocking inserts, but now there is a risk of entries being double counted if moved from current to history while we are querying (since each SELECT creates its own snapshot).
Here's the question then.... what happens with isolation level READ COMMITTED if I do a UNION:
SELECT dept, sum(value) FROM current GROUP BY dept
UNION ALL
SELECT dept, sum(value) FROM history GROUP BY dept;
Will MySQL generate a consistent snapshot of both tables at the same time (thereby removing the risk of double counting) or will it still take snapshot one table first, then some time later take snapshot of the second?
I have not yet found any conclusive documentation to answer my question, so I went about trying to prove it instead. Although not proof in the scientific sense, my findings suggest a consistent snapshot is created for all tables in a UNION query.
Here's what I did.
Create the tables
Create a procedure that sets up 10 entries in the current table (id = 0, .. 9), then sits in a tight loop inserting 1 new row into current and 'moving' the oldest row from current to history. Each iteration is performed in a transaction, as a result the current table remains at a steady 10 rows, while the history table grows quickly. At any point in time min(current.id) = max(history.id) + 1
Start this procedure running (this call won't return for some time - which is intentional)
In another session on the same database we can now try out a variation on the UNION ALL query in my original posting.
I have modified it to (a) slow down execution,and (b) return a simple result set (two rows) that indicates whether any entries 'moved' whilst the query was running have been missed or double counted.
The
sum(value)
andwhere dept in (...)
are just there to add work to the query and slow it down.The indication of a positive outcome is if the two idx values are adjacent, like this:
I'd still be happy to hear any authoritative information on this.