We've noticed one of our tables growing considerably on PG 12. This table is the target of very frequent updates, with a mix of column types, including a very large text
column (often with over 50kb of data) - we run a local cron job that looks for rows older than X time and set the text
column to a null value (as we no longer need the data for that particular column after X amount of time).
We understand this does not actually free up disk space due to the MVCC model, but we were hoping that auto-vacuum would take care of this. To our surprise, the table continues to grow (now over 40gb worth) without auto-vacuum running. Running a vacuum manually has addressed the issue and we no longer see growth.
This has lead me to investigate other tables, I'm realising that I don't understand how auto-vacuum is triggered at all.
Here is my understanding of how it works, which hopefully someone can pick apart:
- I look for tables that have a large amount of dead tuples in them:
select * from pg_stat_all_tables ORDER BY n_dead_tup desc;
- I identify
tableX
with 33169557 dead tuples (n_dead_tup column). - I run a
select * from pg_class ORDER BY reltuples desc;
to check how many estimated rows there are on tabletableX
- I identify 1725253 rows via the
reltuples
column. - I confirm my autovacuum settings:
autovacuum_vacuum_threshold = 50
andautovacuum_vacuum_scale_factor = 0.2
- I apply the formula
threshold + pg_class.reltuples * scale_factor
, so,50 + 1725253 * 0.2
which returns 345100.6
It is my understanding that auto-vacuum will start on this table once ~345100 dead tuples are found. But tableX
is already at a whopping 33169557 dead tuples!, The last_autovacuum on this table was back in February.
Any clarification would be welcome.
Your algorithm is absolutely correct.
Here are some reasons why things could go wrong:
autovacuum runs, but is so slow that it never gets done
If you see no running autovacuum, that is not your problem.
autovacuum runs, but a long running open transaction prevents it from removing dead tuples
other tables need to be vacuumed more urgently (to avoid transaction ID wraparound), so the three workers are busy with other things
autovacuum runs, but conflicts with high concurrent locks on the table (
LOCK TABLE
,ALTER TABLE
, ...)This makes autovacuum give up and try again later.
autovacuum is disabled, perhaps only for that table