Duplicate records are getting inserted in table violating the primary key constraint in postgresql 15.0

68 views Asked by At

I'm using 15.0 version of PostgreSQL. I am facing a strange issue where duplicate data is getting inserted in table violating the primary key. Where as not all the query is showing the duplicate records. But after doing one VACUUM (FULL) on the table the duplicate row started showing in all the queries.

I'm seeing this 2nd time here. Previously txndatetime was a part of composite primary key. I thought this may be due to the fact that Postgres might be storing the timestamp as epoch which is a real number in Postgres. We changed it as a long column dt_epoch. Still we are seeing this. BTW the table is a hash partitioned table .

The version:

select version();
version
PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit

The table structure:

\d my_table
Partitioned table "my_table"
Column Type Collation Nullable Default
grp_id character varying(100) not null
txn_id character varying(100) not null
rec_dt date not null
creation_ts timestamp without time zone not null
created_by_user_id character varying(50) not null
crtd_by_client_id character varying(50) not null
txndatetime timestamp with time zone not null
rrn character varying(150)
dt_epoch bigint not null
Partition key: HASH (grp_id)

Indexes:
   "my_table_pkey" PRIMARY KEY, btree (grp_id, txn_id, dt_epoch)
   "my_table_grp_id_idx" btree (grp_id)
   "my_table_creation_ts_idx" btree (creation_ts)
   "my_table_txn_id_idx" btree (txn_id)
Number of partitions: 8 (Use \d+ to list them.)

The record that showing the duplicate results:

select grp_id, txn_id, txndatetime, dt_epoch, rec_dt 
from my_table 
where rec_dt = '2024-02-21';
grp_id txn_id txndatetime dt_epoch rec_dt
XYZ210220241522100001 2698e117-1cd1-4705-97cb-8a0fc65907d 2024-02-21 15:21:13+05:30 1708509073000 2024-02-21
XYZ210220241522100001 07da7715-e67e-4ff2-a000-9f2e999fbd8 2024-02-21 15:21:13+05:30 1708509073000 2024-02-21
XYZ210220241532100001 7b68ca16-c955-4526-9a40-124b1b215b2 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21
XYZ210220241532100001 7b68ca16-c955-4526-9a40-124b1b215b2 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21
XYZ210220241532100001 9cc5e877-27ff-49ee-812c-1b0b48e8418 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21
XYZ210220241246100001 7f105788-89f3-4742-a642-43b551ab6e0 2024-02-21 12:45:33+05:30 1708499733000 2024-02-21

Whereas after limiting it with rec_dt, grp_id and txn_id it shows only one row:

select * from my_table 
where rec_dt = '2024-02-21' 
and grp_id='XYZ210220241532100001' 
and txn_id='7b68ca16-c955-4526-9a40-124b1b215b2';
grp_id txn_id rec_dt creation_ts created_by_user_id created_by_org_id txndatetime rrn dt_epoch
XYZ210220241532100001 7b68ca16-c955-4526-9a40-124b1b215b2 2024-02-21 2024-02-21 15:32:31.465 1056832970 NPCI 2024-02-21 15:31:39+05:30 405215100006 1708509699000

Again with grp_id search it is not showing the said row multiple times:

select grp_id, txn_id, txndatetime, dt_epoch, rec_dt 
from my_table 
where  grp_id='XYZ210220241532100001';
grp_id txn_id txndatetime dt_epoch rec_dt
XYZ210220241532100001 7b68ca16-c955-4526-9a40-124b1b215b2 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21
XYZ210220241532100001 9cc5e877-27ff-49ee-812c-1b0b48e8418 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21

Now after doing a VACUUM (FULL) on that table all the query started showing the duplicate rows:

DGschema=# VACUUM (FULL) DGschema.my_table;
VACUUM
select grp_id, txn_id, txndatetime, dt_epoch, rec_dt 
from my_table 
where rec_dt = '2024-02-21';
grp_id txn_id txndatetime dt_epoch rec_dt
XYZ210220241522100001 2698e117-1cd1-4705-97cb-8a0fc65907d 2024-02-21 15:21:13+05:30 1708509073000 2024-02-21
XYZ210220241522100001 07da7715-e67e-4ff2-a000-9f2e999fbd8 2024-02-21 15:21:13+05:30 1708509073000 2024-02-21
XYZ210220241532100001 7b68ca16-c955-4526-9a40-124b1b215b2 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21
XYZ210220241532100001 7b68ca16-c955-4526-9a40-124b1b215b2 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21
XYZ210220241532100001 9cc5e877-27ff-49ee-812c-1b0b48e8418 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21
ABC210220241246100001 7f105788-89f3-4742-a642-43b551ab6e0 2024-02-21 12:45:33+05:30 1708499733000 2024-02-21
select grp_id, txn_id, txndatetime, dt_epoch, rec_dt, creation_ts 
from my_table 
where  grp_id='XYZ210220241532100001';
grp_id txn_id txndatetime dt_epoch rec_dt creation_ts
XYZ210220241532100001 7b68ca16-c955-4526-9a40-124b1b215b2 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21 2024-02-21 15:32:31.465
XYZ210220241532100001 7b68ca16-c955-4526-9a40-124b1b215b2 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21 2024-02-21 15:32:36.511
XYZ210220241532100001 9cc5e877-27ff-49ee-812c-1b0b48e8418 2024-02-21 15:31:39+05:30 1708509699000 2024-02-21 2024-02-21 15:32:50.487

This is dangerous bug in Postgres which is defying the basics and making PostgreSQL unreliable. Is it a bug in this PostgreSQL version?

0

There are 0 answers