HOT and FILLFACTOR Results
I've got some high-UPDATE tables where I've adjusted the FILLFACTOR
to 95%, and I'm checking back in on them. I don't think that I've got the settings right, and am unclear how to tune them intelligently. I took another pass through Laurenz Albe's helpful blog post on HOT updates
https://www.cybertec-postgresql.com/en/hot-updates-in-postgresql-for-better-performance/
... and the clear source code READ ME:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT
Below is a query, adapted from the blog post, to check the status of the tables in the system, along with some sample output:
SELECT relname,
n_tup_upd as total_update_count,
n_tup_hot_upd as hot_update_count,
coalesce(div_safe(n_tup_upd, n_tup_hot_upd),0) as total_by_hot,
coalesce(div_safe(n_tup_hot_upd, n_tup_upd),0) as hot_by_total
FROM pg_stat_user_tables
order by 4 desc;
A few results:
relname total_update_count hot_update_count total_by_hot hot_by_total
rollups 369418 128 2886.0781 0.00034649097
q_event 71781 541 132.68207 0.007536813
analytic_scan 2104727 34304 61.35515 0.016298551
clinic 4424 77 57.454544 0.017405063
facility_location 179636 6489 27.683157 0.03612305
target_snapshot 494 18 27.444445 0.036437247
inv 1733021 78234 22.151762 0.045143135
I'm unsure what ratio(s) I'm looking for here. Can anyone advise me how to read these results, or what to read to figure out how to interpret them?
Are These UPDATEs HOTable?
I didn't address this basic point in the original draft of this question. I checked my patch from a few months back, and I ran SET (fillfactor = 95)
and then VACUUM (FULL, VERBOSE, ANALYZE)
on 13 of my tables. (The VERBOSE
is in there as I had some tables that couldn't VACUUM
because of a months-old process that needed clearing out, and that's how I found the problem. pg_stat_activity
is my friend.)
However, at least most touch an indexed column...but with an identical value. Like 1 = 1
, so no change to the value. I've been thinking that that is HOTable. If I'm wrong about that, bummer. If not, I'm mostly hoping to clarify what exactly the goal is for the relationships amongst fillfactor
, n_tup_upd
, and n_tup_hot_upd
.
SELECT relname,
n_tup_upd as total_update_count,
n_tup_hot_upd as hot_update_count,
coalesce(div_safe(n_tup_upd, n_tup_hot_upd),0) as total_by_hot,
coalesce(div_safe(n_tup_hot_upd, n_tup_upd),0) as hot_by_total,
(select value::integer from table_get_options('data',relname) where option = 'fillfactor') as fillfactor_setting
FROM pg_stat_user_tables
WHERE relname IN (
'activity',
'analytic_productivity',
'analytic_scan',
'analytic_sterilizer_load',
'analytic_sterilizer_loadinv',
'analytic_work',
'assembly',
'data_file_info',
'inv',
'item',
'print_job',
'q_event')
order by 4 desc;
Results:
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| relname | total_update_count | hot_update_count | total_divided_by_hot | hot_divided_by_total | fillfactor_setting |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| q_event | 71810 | 553 | 129.85533 | 0.0077008773 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_scan | 2109206 | 34536 | 61.072678 | 0.016373934 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| inv | 1733176 | 78387 | 22.110502 | 0.045227375 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| item | 630586 | 32110 | 19.638306 | 0.05092089 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_sterilizer_loadinv | 76976539 | 5206806 | 14.783831 | 0.06764147 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_work | 8117050 | 608847 | 13.331839 | 0.07500841 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| assembly | 90580 | 7281 | 12.4405985 | 0.08038198 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_sterilizer_load | 19249 | 2997 | 6.422756 | 0.1556964 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| activity | 3795 | 711 | 5.3375525 | 0.18735178 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_productivity | 106486 | 25899 | 4.1115875 | 0.24321507 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| print_job | 1414 | 388 | 3.6443298 | 0.27439886 | 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| data_file_info | 402086 | 285663 | 1.4075537 | 0.7104525 | 90 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
(I just looked for and found an on-line table generator to help out with this kind of example at https://www.tablesgenerator.com/text_tables. It's a bit awkward to use, but faster than building out monospaced aligned text manually.)
FILLFACTOR and HOT update ratio
Figured I could sort this out a bit by adapting Laurenz Albe's code from https://www.cybertec-postgresql.com/en/hot-updates-in-postgresql-for-better-performance/. All I've done here is make a script that builds out a table with a FILLFACTOR
of 10, 20, 30.....100% and then updated it in the same way for each percentage. Every time the table is created, it is populated with 256 records, which are then updated 10 times each. The update sets a non-indexed field back to itself, so no value actually changes:
UPDATE mytable SET val = val;
Below are the results:
+------------+---------------+-------------+--------------+
| FILLFACTOR | total_updates | hot_updates | total_to_hot |
+------------+---------------+-------------+--------------+
| 10 | 2350 | 2350 | 1.00 |
+------------+---------------+-------------+--------------+
| 20 | 2350 | 2350 | 1.00 |
+------------+---------------+-------------+--------------+
| 30 | 2350 | 2350 | 1.00 |
+------------+---------------+-------------+--------------+
| 40 | 2350 | 2350 | 1.00 |
+------------+---------------+-------------+--------------+
| 50 | 2350 | 2223 | 1.06 |
+------------+---------------+-------------+--------------+
| 60 | 2350 | 2188 | 1.07 |
+------------+---------------+-------------+--------------+
| 70 | 2350 | 1883 | 1.25 |
+------------+---------------+-------------+--------------+
| 80 | 2350 | 1574 | 1.49 |
+------------+---------------+-------------+--------------+
| 90 | 2350 | 1336 | 1.76 |
+------------+---------------+-------------+--------------+
| 100 | 2350 | 987 | 2.38 |
+------------+---------------+-------------+--------------+
From this, it seems that when the total_to_hot
ratio rises, there may be a benefit to increasing the FILLFACTOR
.
https://www.postgresql.org/docs/13/monitoring-stats.html
n_tup_upd
counts all updates, including HOT updates, and n_tup_hot_upd
counts HOT updates only. But it doesn't seem to be a count of "could have been a HOT update, if we hadn't run out of room on the page." That would be great, but it also seems like a lot to ask for. (And maybe more expensive to keep track of that can be justified?)
Here is the script. I edited and re-ran the test with each FILLFACTOR
.
-- Set up the table for the test
DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable (
id integer PRIMARY KEY,
val integer NOT NULL
) WITH (autovacuum_enabled = off);
-- Change the FILLFACTOR. The default is 100.
ALTER TABLE mytable SET (fillfactor = 10); -- The only part that changes between runs.
-- Seed the data
INSERT INTO mytable
SELECT *, 0
FROM generate_series(1, 235) AS n;
-- Thrash the data
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
UPDATE mytable SET val = val;
SELECT pg_sleep(1);
-- How does it look?
SELECT n_tup_upd as total_updates,
n_tup_hot_upd as hot_updates,
div_safe(n_tup_upd, n_tup_hot_upd) as total_to_hot
FROM pg_stat_user_tables
WHERE relname = 'mytable';
Checking the FILLFACTOR Setting
As a side-note, I wanted a quick call to check the FILLFACTOR
setting on a table, and it turned out to be more involved than I thought. I wrote up a function that works, but could likely see some improvements...if anyone has suggestions. I call it like this:
select * from table_get_options('foo', 'bar');
or
select * from table_get_options('foo','bar') where option = 'fillfactor';
Here's the code, if anyone has improvements to offer:
CREATE OR REPLACE FUNCTION dba.table_get_options(text,text)
RETURNS TABLE (
schema_name text,
table_name text,
option text,
value text
)
LANGUAGE SQL AS
$BODY$
WITH
packed_options AS (
select pg_class.relname as table_name,
btrim(pg_options_to_table(pg_class.reloptions)::text, '()') as option_kvp -- Convert to text (fillfactor,95), then strip off ( and )
from pg_class
join pg_namespace
on pg_namespace.oid = pg_class.relnamespace
where pg_namespace.nspname = $1
and relname = $2
and reloptions is not null
),
unpacked_options AS (
select $1 as schema_name,
$2 as table_name,
split_part(option_kvp, ',', 1) as option,
split_part(option_kvp, ',', 2) as value
from packed_options
)
select * from unpacked_options;
$BODY$;
The numbers show that your strategy is not working, and the overwhelming majority of updates are not HOT. You also show the reason: Even if you update an indexed column to the original value, you won't get a HOT update.
The solution would be to differentiate by including the indexed column in the
UPDATE
statement only if it is really modified.A
fillfactor
of 95 is also pretty high, unless you have tables with really small rows. Perhaps you would get better results with a setting like 90 or 85.