I have encountered several occasions where my read query is stuck for hours and when checked on pg_stat_activity
it has wait_event_type
of IOWait
. It also happens that every time this happens, there is an active autovacuum running on that table. The table is a partitioned table using pg_partman and I am using Postgres 11.
Simplified version of query
SELECT *
FROM bookings
WHERE user_id=? AND user_type=?
There is an index for the query, something like
CREATE index_user_id_user_type ON bookings(user_id, user_type)
I noticed that there is an optimization on Postgres 14 with regards to autovacuum on partitioned table (https://www.postgresql.org/docs/14/release-14.html) which made me more suspicious that autovacuum is indeed the reason for this stuck/hang query with IOWait
Autovacuum now analyzes partitioned tables (Yuzuko Hosoya, Álvaro Herrera)
Insert, update, and delete tuple counts from partitions are now propagated to their parent tables so autovacuum knows when to process them.
Could this be caused by autovacuum?
- If yes, why? Is there a way to avoid this?
- If no, what could be the reason?
Edit(1) Added table schema & explain plan
Table schema
CREATE TABLE public.bookings (
order_number text not null,
event_timestamp with time zone not null,
customer_id text not null,
driver_id text,
...
) PARTITION BY RANGE (event_timestamp);
Explain plan: https://explain.depesz.com/s/y3YH
The wait event proves that you are waiting for I/O.
VACUUM
does a lot of I/O. These two facts together strongly point in the direction of an overloaded I/O system.If on Linux, you can verify that by watching the I/O wait percentage of the CPU time in
vmstat 1
: if the value is consistently over 10, you have a proof.Either improve the I/O capacity or slow down autovacuum by lowering
autovacuum_vacuum_cost_limit
. I recommend the first, because if autovacuum is too slow, you can end up with even worse problems.