Why have Postgres tables doubled in size after cancelling a DELETE?

130 views Asked by At

I ran a simple delete query on a telemetries table to clear up some data but cancelled as it was taking really long. I noticed that after I cancelled the query some of the tables (all partitions of the telemetries table) are now taking up more than double the space they previously were. For example, a table that was previously 16GB is now 28GB (pg_size_pretty value).

I do not want to proceed with the delete because I want to keep the data but why have the tables swelled up so much? Would a VACUUM FULL help? If I run VACUUM, do I run the risk of actually deleting the data permanently?

I ran the following using psql;

delete from telemtries where timestamp < 1668987465000;

I cancelled with ctrl+c. I ran \s to check my query history aftwerwards and my delete query was not there. I'm using postgress 12.16

2

There are 2 answers

2
Erwin Brandstetter On

DELETE does not inflate tables. (UPDATE could, like Frank commented.)

Aside from triggers that might do anything ON DELETE, the only remaining thing that comes to mind would be a FOREIGN KEY constraint with ON DELETE SET NULL or ... SET DEFAULT referencing the target table of your DELETE. That would actually write new row versions for referencing rows, which linger as dead tuples in case of a ROLLBACK. Either way, it would bloat those tables, but not the target table of the DELETE (unless it's a self-referencing FK).

(Might also explain why the DELETE took so long. But there is a whole range of other possible reasons for that. You have not been overly specific.)

VACUUM FULL removes all bloat. See:

1
FilipA On

In PostgreSQL, canceling a long-running delete query might leave behind some dead (or obsolete) rows that were marked for deletion but were not yet physically removed from the table. These dead rows could contribute to the increase in table size.

VACUUM FULL requires an exclusive lock on the table, which means other transactions cannot access the table during the operation. This could lead to downtime for your application.

Instead of VACUUM FULL, you might want to try a regular VACUUM first. It's less aggressive and has fewer locking and performance implications.

VACUUM <your_table_name>;

After running vacuum, consider running ANALYZE to update statistics, which can help the query planner make better decisions about how to execute queries.

ANALYZE <your_table_name>;

If you're concerned about potential data loss, it's always a good idea to have a recent backup before performing maintenance operations on your database. This way, you can restore the data if anything unexpected happens.