Postgres (AWS Aurora) is not enforcing unique index/constraint

988 views Asked by At

We are using Postgres for our production database, it's technically an Amazon AWS Aurora database using the 10.11 engine version. It doesn't seem to be under any unreasonable load (100-150 concurrent connections, CPU always under 10%, about 50% of the memory used, spikes to 300 write IOPS / 1500 read IOPS per second).

We like to ensure really good data consistency, so we make extensive use of foreign keys, triggers to validate data as it's being inserted/updated and also lots of unique constraints.

Most of the writes originate from simple REST API requests, which result in very standard insert and update queries. However, in some cases we also use triggers and functions to handle more complicated logic. For example, an update to one table will result in some fairly complicated cascading updates to other tables.

All queries are always wrapped in transactions, and for the most part we do not make use of explicit locking.

So what's wrong?

We have many (dozens of rows, across dozens of tables) instances where data exists in the database which does not conform to our unique constraints.

Sometimes the created_at and updated_at timestamps for the offending rows are identical, other times they are very similar (within half a second). This leads me to believe that this is being caused by a race condition.

We're not certain, but are fairly confident that the thing in common with these records is that the writes either triggered a function (the record was written from a simple insert or update, and caused several other tables to be updated) or that the write came from a function (a different record was written from a simple insert or update, which triggered a function that wrote the offending data).

From what I have been able to research, unique constraints/indexes are incredibly reliable and "just work". Is this true? If so, then why might this be happening?

Here is an example of some offending data, I've had to black out some of it, but I promise you the values in the user_id field are identical. As you will see below, there is a unique index across user_id, position, and undeleted. So the presence of this data should be impossible.

Offending data

Here is an export of table structure:

-- Table Definition ----------------------------------------------

CREATE TABLE guides.preferences (
    id uuid DEFAULT gen_random_uuid() PRIMARY KEY,
    user_id uuid NOT NULL REFERENCES users.users(id),
    guide_id uuid NOT NULL REFERENCES users.users(id),
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    undeleted boolean DEFAULT true,
    deleted_at timestamp without time zone,
    position integer NOT NULL CHECK ("position" >= 0),
    completed_meetings_count integer NOT NULL DEFAULT 0,
    CONSTRAINT must_concurrently_set_deleted_at_and_undeleted CHECK (undeleted IS TRUE AND deleted_at IS NULL OR undeleted IS NULL AND deleted_at IS NOT NULL),
    CONSTRAINT preferences_guide_id_user_id_undeleted_unique UNIQUE (guide_id, user_id, undeleted),
    CONSTRAINT preferences_user_id_position_undeleted_unique UNIQUE (user_id, position, undeleted) DEFERRABLE INITIALLY DEFERRED
);
COMMENT ON COLUMN guides.preferences.undeleted IS 'Set simultaneously with deleted_at to flag this as deleted or undeleted';
COMMENT ON COLUMN guides.preferences.deleted_at IS 'Set simultaneously with deleted_at to flag this as deleted or undeleted';

-- Indices -------------------------------------------------------

CREATE UNIQUE INDEX preferences_pkey ON guides.preferences(id uuid_ops);
CREATE UNIQUE INDEX preferences_user_id_position_undeleted_unique ON guides.preferences(user_id uuid_ops,position int4_ops,undeleted bool_ops);
CREATE INDEX index_preferences_on_user_id_and_guide_id ON guides.preferences(user_id uuid_ops,guide_id uuid_ops);
CREATE UNIQUE INDEX preferences_guide_id_user_id_undeleted_unique ON guides.preferences(guide_id uuid_ops,user_id uuid_ops,undeleted bool_ops);

We're really stumped by this, and hope that someone might be able to help us. Thank you!

1

There are 1 answers

0
Craig Ulliott On

I found it the reason! We have been building a lot of new functionality over the last few months, and have been running lots of migrations to change schema and update data. Because of all the triggers and functions in our database, it often makes sense to temporarily disable triggers. We do this with “set session_replication_role = ‘replica’;”.

Turns out that this also disables all deferrable constraints, because deferrable constraints and foreign keys are trigger based. As you can see from the schema in my question, the unique constraint in question is set as deferrable.

Mystery solved!