create table the_table (
id integer, root_id integer, parent_id integer, status text, ts timestamp, comment text);
insert into the_table values
(1, null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, standalone'),
(2, null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, root of 3,4'),
(3, 2, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 2, parent of 4'),
(4, 2, 3, 'OPEN', now()-'92d'::interval, '>90 days old, open, child of 2,3'),
(5, null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, root of 6,7'),
(6, 5, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 5, parent of 4'),
(7, 5, 6, 'COMPLETE', now()-'10d'::interval, '<=90 days old, complete, child of 5,6' )
(8, null, null, 'COMPLETE', now()-'10d'::interval, '<=90 days old, complete, standalone'),
(9, null, null, 'COMPLETE', now()-'10d'::interval, '<=90 days old, complete, root of 10'),
(10,9, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 9' ),
(11,11, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, parent/child of self'),
(12,null, 12, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, parent/child of self'),
(13,14, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, cross-parent/child of 14'),
(14,13, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, cross-parent/child of 13'),
(15,null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, parent of 16,17'),
(16,null, 15, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 15'),
(17,null, 15, 'OPEN', now()-'10d'::interval, '<=90 days old, open, child of 15');
I want to delete all records which are
- older than 90 days
- AND in
COMPLETE
status. - AND not related to rows that aren't older than 90 days and
COMPLETE
, directly or indirectly
Examples:
the row with
id=1
should be deleted.id=2
andid=3
areCOMPLETE
and older than 90 days but sinceid=4
isOPEN
and hasroot_id=2
andparent_id=3
, these three rows (id in (2,3,4)
) should not be deleted.id=5
,id=6
andid=7
are allCOMPLETE
butid=7
is not older than 90 days and hasroot_id=5
andparent_id=6
, so these three rows (id=5
,id=6
andid=7
) should not be deleted.id=15
andid=16
are both old enough andCOMPLETE
butid=17
is stillOPEN
and younger than 90 days.16
is not directly related to17
but they share15
as parent so these three rows should not be deleted.
I tried as per this answer, but there is one case failing: a removable parent with two children, one non-removable, one removable. (Last test case here fiddle)
While deleting, I would like to keep the entire group until all its members become removable. I tried using recursive CTE, but still I am not able to achieve it. Below is the query I tried:
WITH recursive RecursiveHierarchy AS (
SELECT id, root_id, parent_id, status, ts, comment
FROM the_table
WHERE status = 'OPEN'
UNION ALL
SELECT t.id, t.root_id, t.parent_id, t.status, t.ts, t.comment
FROM the_table t
JOIN RecursiveHierarchy r ON ( ( 'OPEN'=t.status
OR now()-t.ts <= '90d'::interval )
AND ( t.id IN(r.root_id,r.parent_id)
OR r.id IN(t.root_id,t.parent_id) ) )
)
DELETE FROM the_table
WHERE ts < NOW() - '90 days'::interval
AND status = 'COMPLETE'
AND id NOT IN (SELECT id FROM RecursiveHierarchy)
RETURNING *;
I tried using FUNCTION as well
CREATE OR REPLACE FUNCTION delete_old_records()
RETURNS VOID AS $$
DECLARE
record_to_delete the_table;
BEGIN
FOR record_to_delete IN SELECT * FROM the_table
WHERE ts < NOW() - '90 days'::interval AND status = 'COMPLETE'
LOOP
IF NOT EXISTS (
SELECT 1 FROM the_table
WHERE id = record_to_delete.id
OR root_id = record_to_delete.id
OR parent_id = record_to_delete.id
) THEN
DELETE FROM the_table WHERE id = record_to_delete.id;
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
DELETE
except aWHERE id NOT IN()
against the list of protected non-removables.Demo at db<>fiddle
You can also take the opposite route: find all immediate removables and search their relations to make sure they aren't linked to something non-removable: demo2
Note that whatever approach you take, this is doomed to be fairly complex because of the recursive nature of the problem. In the worst case, you could have 15k records, each one related to the next, all of which are removable, and only the last one turns out to be non-removable. The entire set would still have to be checked one-by-one, which takes time and memory in this structure. What you can do is change the structure and introduce a group_id/family_id: mark all rows in a family with a common id and make a trigger maintain it. Then, searching for non-removable relation becomes trivial.
If you really have to do complex graph traversal on db, consider
pgrouting
(example) orapache-age
.