I have the following table in Postgres 14:
CREATE TABLE mytable (
id text PRIMARY KEY
, top bigint NOT NULL
, top_timestamp bigint NOT NULL
);
I would like to upsert top and top_timestamp while returning the old values (if any). If the row already exists, I want to update it only if the new value of top is > old value.
The solution also HAS to account for concurrent writes.
For the upsert, I have this so far:
INSERT INTO mytable
AS mt (id, top, top_timestamp)
VALUES ('some-id', 123, 999)
ON CONFLICT (id)
DO UPDATE SET top=123, top_timestamp=999
WHERE mt.top < 123
I realise that RETURNING can only refer to new values, so to return the old values, I have this:
WITH old_values AS (
SELECT top, top_timestamp FROM mytable
WHERE id = 'some-id' FOR UPDATE
)
, upd AS ($upsertSqlAbove)
SELECT top AS old_top, top_timestamp AS old_top_timestamp
FROM old_values;
This seems to work, but is it safe for concurrent writes?
Does SELECT FOR UPDATE work as I expect here? Namely, it will lock the row until the whole query is finished?
What is the difference between that first query and this:
INSERT INTO mytable
AS mt (id, top, top_timestamp)
VALUES ('some-id', 123, 999)
ON CONFLICT (id)
DO UPDATE SET top=123, top_timestamp=999
WHERE mt.top < 123
RETURNING (SELECT top FROM mytable WHERE id = 123) AS last_top,
(SELECT top_timestamp FROM mytable WHERE id = 123) AS last_top_timestamp
Here is a fiddle to test.
What's the issue?
You ask:
Either reports old values from the last row version that has just been updated in a concurrency-safe manner. But the first one is more expensive. It locks an existing row earlier. So it holds the lock for longer and possibly blocks concurrent writes to the same row for longer. It also executes one more statement. Most importantly, it locks the row even when no later
UPDATEhappens, which is a complete waste.So the 2nd query is better. Except for two flaws:
It runs a separate
SELECTfor every column. That would seem necessary since a subquery expression in theRETURNINGclause can only return a single value.It runs those
SELECTqueries even after anINSERT. Then theSELECTis guaranteed to come up empty - at full cost.Return the old row only if updated
Unfortunately, the
UPDATEpart of an UPSERT does not allow additional tables in aFROMclause - unlike regularUPDATE. See:So your attempts are well-founded.
This query fixes both flaws (tested in Postgres 14 & 16):
fiddle
(To clarify for the general audience: The subquery in the
RETURNINGclause operates on the same snapshot of the table and does not see any row inserted or updated in the same statement, yet. It sees the old row version.)The test
t.xmax <> 0filters actual updates. See:And we can circumvent the discussed restriction for subqueries by returning a single (well-known) row type, and then de-composing it - with
(SELECT ...).*.Plus, use the special row variable
EXCLUDED. It holds the row proposed for insert, and helps to avoid spelling out input values repeatedly. (Subtle difference: all default values and triggers have already been applied.)This returns the whole row. To only return selected columns, what the UPSERT in a CTE and decompose later:
Note how I decompose the row type in a subquery once. This is to avoid repeated evaluation. See:
Return the old row in any case
According to your comment, you even want the old row back if no update happens. If the
WHEREcondition is not true, no update happens andRETURNINGdoes not return any rows. That's an even trickier combination ofSELECT,INSERTandUPDATE(UPSERT). Above query won't cut it. Your CTE solution would work. Or an empty update like used here:Simple, but wasteful - as explained in my answer to the same question:
Combine everything we have learned so far with this:
To arrive at this optimized function for single-row UPSERT:
Call:
Or:
fiddle
OUT old_row tbluses the registered row type of the table as "OUT" parameter. Replace with your actual (schema-qualified) table name. Obviously, adapt all column names and types to your actual table definition.Should be the perfect solution. Keeps locks to a minimum (doesn't write-lock at all, when just returning an old, unchanged row), and handles every corner case concurrent writes might produce in default
READ COMMITTEDisolation level. The call becomes super simple and short as demonstrated.Possible downsides:
CASCADEto theDROPcommand.