Unclear about how repeatable-read works in RDBMS

89 views Asked by At

I'm uncertain about how repeatable-read operates in RDBMS. I initially thought it functions similarly to Git branching, where an exact copy of the current database is utilized. However, I'm puzzled by the following scenario:

Tx1                                       Tx2
1. BEGIN;                                 1. BEGIN;
2. SELECT * FROM TABLE; (10 ROWS)         2. SELECT * FROM TABLE; (10 ROWS)

3. INSERT NEW ROW IN TABLE;
4. COMMIT;

                                          3. UPDATE ROW WITH ID=11; (executes successfully)

When Tx1 and Tx2 are operating with independent snapshots, how can Tx2 access the newly created row by Tx1? I recognize this as an instance of phantom reads but am unclear about why it occurred.

1

There are 1 answers

6
SerhiiH On

You might mistaken phantom reads with non-repeatable reads and your example is specific to MySQL.

REPEATABLE READ

... It prevents any rows that are queried from being changed by other transactions, thus blocking non-repeatable reads but not phantom reads ...

That phantom read you are getting is because you are inserting a new row, hence in doesn't modify any data that Tx2 was reading in the first place. This case is documented (see link above) and allowing phantom reads!
This is common for RDBMS to allow phantom reads on REPEATABLE READ isolation level. However different RDBMS defines it differently.

This is handy reference that is possible and what not per isolation level:

Isolation Level Dirty Read Nonrepeatable Read Phantom Read
Read uncommitted Possible Possible Possible
Read committed Not possible Possible Possible
Repeatable read Not possible Not possible Possible
Serializable Not possible Not possible Not possible

This is what MySQL documentation is saying about phantom rows

SELECT * FROM child WHERE id > 100 FOR UPDATE;
The query scans the index starting from the first record where id is bigger than 100. Let the table contain rows having id values of 90 and 102. If the locks set on the index records in the scanned range do not lock out inserts made in the gaps (in this case, the gap between 90 and 102), another session can insert a new row into the table with an id of 101. If you were to execute the same SELECT within the same transaction, you would see a new row with an id of 101 (a “phantom”) in the result set returned by the query.

As you can see it doesn't specify if gap is also from 103 to MAX_INT. Trying execute in a lock mode, you will see phantom rows.

SELECT * FROM TABLE FOR UPDATE; (11 ROWS)
SELECT * FROM TABLE FOR SHARE; (11 ROWS)

This is why your UPDATE was successful.

Interestingly enough if you run regular non locking SELECT after locking one, phantom rows will disappear.

SELECT * FROM TABLE; (10 ROWS)