Tuning FILLFACTOR for high-UPDATE tables in Postgres 13

4.6k views Asked by At

HOT and FILLFACTOR Results

I've got some high-UPDATE tables where I've adjusted the FILLFACTOR to 95%, and I'm checking back in on them. I don't think that I've got the settings right, and am unclear how to tune them intelligently. I took another pass through Laurenz Albe's helpful blog post on HOT updates

https://www.cybertec-postgresql.com/en/hot-updates-in-postgresql-for-better-performance/

... and the clear source code READ ME:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT

Below is a query, adapted from the blog post, to check the status of the tables in the system, along with some sample output:

SELECT relname,
       n_tup_upd                                          as total_update_count,
       n_tup_hot_upd                                      as hot_update_count,
       coalesce(div_safe(n_tup_upd,     n_tup_hot_upd),0) as total_by_hot,
       coalesce(div_safe(n_tup_hot_upd, n_tup_upd),0)     as hot_by_total
                                      
FROM pg_stat_user_tables
order by 4 desc;

A few results:

relname            total_update_count    hot_update_count    total_by_hot   hot_by_total
rollups                        369418                 128    2886.0781      0.00034649097
q_event                         71781                 541    132.68207      0.007536813
analytic_scan                 2104727               34304     61.35515      0.016298551
clinic                           4424                  77     57.454544     0.017405063
facility_location              179636                6489     27.683157     0.03612305
target_snapshot                   494                  18     27.444445     0.036437247
inv                           1733021               78234     22.151762     0.045143135

I'm unsure what ratio(s) I'm looking for here. Can anyone advise me how to read these results, or what to read to figure out how to interpret them?

Are These UPDATEs HOTable?

I didn't address this basic point in the original draft of this question. I checked my patch from a few months back, and I ran SET (fillfactor = 95) and then VACUUM (FULL, VERBOSE, ANALYZE) on 13 of my tables. (The VERBOSE is in there as I had some tables that couldn't VACUUM because of a months-old process that needed clearing out, and that's how I found the problem. pg_stat_activity is my friend.)

However, at least most touch an indexed column...but with an identical value. Like 1 = 1, so no change to the value. I've been thinking that that is HOTable. If I'm wrong about that, bummer. If not, I'm mostly hoping to clarify what exactly the goal is for the relationships amongst fillfactor, n_tup_upd, and n_tup_hot_upd.

SELECT relname,
       n_tup_upd                                          as total_update_count,
       n_tup_hot_upd                                      as hot_update_count,
       coalesce(div_safe(n_tup_upd,     n_tup_hot_upd),0) as total_by_hot,
       coalesce(div_safe(n_tup_hot_upd, n_tup_upd),0)     as hot_by_total,
       (select value::integer from table_get_options('data',relname) where option = 'fillfactor') as fillfactor_setting
                         
FROM pg_stat_user_tables
WHERE relname IN  (
    'activity',
    'analytic_productivity',
    'analytic_scan',
    'analytic_sterilizer_load',
    'analytic_sterilizer_loadinv',
    'analytic_work',
    'assembly',
    'data_file_info',
    'inv',
    'item',
    'print_job',
    'q_event')
    
order by 4 desc;

Results:

+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| relname                     | total_update_count | hot_update_count | total_divided_by_hot | hot_divided_by_total | fillfactor_setting |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| q_event                     |              71810 |              553 |            129.85533 |         0.0077008773 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_scan               |            2109206 |            34536 |            61.072678 |          0.016373934 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| inv                         |            1733176 |            78387 |            22.110502 |          0.045227375 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| item                        |             630586 |            32110 |            19.638306 |           0.05092089 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_sterilizer_loadinv |           76976539 |          5206806 |            14.783831 |           0.06764147 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_work               |            8117050 |           608847 |            13.331839 |           0.07500841 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| assembly                    |              90580 |             7281 |           12.4405985 |           0.08038198 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_sterilizer_load    |              19249 |             2997 |             6.422756 |            0.1556964 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| activity                    |               3795 |              711 |            5.3375525 |           0.18735178 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| analytic_productivity       |             106486 |            25899 |            4.1115875 |           0.24321507 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| print_job                   |               1414 |              388 |            3.6443298 |           0.27439886 |                 95 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+
| data_file_info              |             402086 |           285663 |            1.4075537 |            0.7104525 |                 90 |
+-----------------------------+--------------------+------------------+----------------------+----------------------+--------------------+

(I just looked for and found an on-line table generator to help out with this kind of example at https://www.tablesgenerator.com/text_tables. It's a bit awkward to use, but faster than building out monospaced aligned text manually.)

FILLFACTOR and HOT update ratio

Figured I could sort this out a bit by adapting Laurenz Albe's code from https://www.cybertec-postgresql.com/en/hot-updates-in-postgresql-for-better-performance/. All I've done here is make a script that builds out a table with a FILLFACTOR of 10, 20, 30.....100% and then updated it in the same way for each percentage. Every time the table is created, it is populated with 256 records, which are then updated 10 times each. The update sets a non-indexed field back to itself, so no value actually changes:

UPDATE mytable SET val = val;

Below are the results:

+------------+---------------+-------------+--------------+
| FILLFACTOR | total_updates | hot_updates | total_to_hot |
+------------+---------------+-------------+--------------+
|         10 |          2350 |        2350 |         1.00 |
+------------+---------------+-------------+--------------+
|         20 |          2350 |        2350 |         1.00 |
+------------+---------------+-------------+--------------+
|         30 |          2350 |        2350 |         1.00 |
+------------+---------------+-------------+--------------+
|         40 |          2350 |        2350 |         1.00 |
+------------+---------------+-------------+--------------+
|         50 |          2350 |        2223 |         1.06 |
+------------+---------------+-------------+--------------+
|         60 |          2350 |        2188 |         1.07 |
+------------+---------------+-------------+--------------+
|         70 |          2350 |        1883 |         1.25 |
+------------+---------------+-------------+--------------+
|         80 |          2350 |        1574 |         1.49 |
+------------+---------------+-------------+--------------+
|         90 |          2350 |        1336 |         1.76 |
+------------+---------------+-------------+--------------+
|        100 |          2350 |         987 |         2.38 |
+------------+---------------+-------------+--------------+

From this, it seems that when the total_to_hot ratio rises, there may be a benefit to increasing the FILLFACTOR.

https://www.postgresql.org/docs/13/monitoring-stats.html

n_tup_upd counts all updates, including HOT updates, and n_tup_hot_upd counts HOT updates only. But it doesn't seem to be a count of "could have been a HOT update, if we hadn't run out of room on the page." That would be great, but it also seems like a lot to ask for. (And maybe more expensive to keep track of that can be justified?)

Here is the script. I edited and re-ran the test with each FILLFACTOR.

-- Set up the table for the test
DROP TABLE IF EXISTS mytable;

CREATE TABLE mytable (
   id  integer PRIMARY KEY,
   val integer NOT NULL
) WITH (autovacuum_enabled = off);
 
-- Change the FILLFACTOR. The default is 100.
ALTER TABLE mytable SET (fillfactor = 10);  -- The only part that changes between runs.

-- Seed the data
INSERT INTO mytable
SELECT *, 0
FROM generate_series(1, 235) AS n;

-- Thrash the data
UPDATE mytable SET val = val;
SELECT pg_sleep(1);

UPDATE mytable SET val = val;
SELECT pg_sleep(1);

UPDATE mytable SET val = val;
SELECT pg_sleep(1);

UPDATE mytable SET val = val;
SELECT pg_sleep(1);

UPDATE mytable SET val = val;
SELECT pg_sleep(1);

UPDATE mytable SET val = val;
SELECT pg_sleep(1);

UPDATE mytable SET val = val;
SELECT pg_sleep(1);

UPDATE mytable SET val = val;
SELECT pg_sleep(1);

UPDATE mytable SET val = val;
SELECT pg_sleep(1);

UPDATE mytable SET val = val;
SELECT pg_sleep(1);

-- How does it look?
SELECT n_tup_upd                          as total_updates,
       n_tup_hot_upd                      as hot_updates, 
       div_safe(n_tup_upd, n_tup_hot_upd) as total_to_hot
      
  FROM pg_stat_user_tables 
  
 WHERE relname = 'mytable';

Checking the FILLFACTOR Setting

As a side-note, I wanted a quick call to check the FILLFACTOR setting on a table, and it turned out to be more involved than I thought. I wrote up a function that works, but could likely see some improvements...if anyone has suggestions. I call it like this:

select * from table_get_options('foo', 'bar');

or

select * from table_get_options('foo','bar') where option = 'fillfactor';

Here's the code, if anyone has improvements to offer:

CREATE OR REPLACE FUNCTION dba.table_get_options(text,text)
  RETURNS TABLE (
    schema_name text,
    table_name  text,
    option      text,
    value       text
)

LANGUAGE SQL AS

$BODY$
WITH
packed_options AS (
select pg_class.relname as table_name,
       btrim(pg_options_to_table(pg_class.reloptions)::text, '()') as option_kvp -- Convert to text (fillfactor,95), then strip off ( and )

 from pg_class
 join pg_namespace
   on pg_namespace.oid = pg_class.relnamespace

where pg_namespace.nspname = $1
  and relname              = $2
  and reloptions    is not null
),

unpacked_options AS (
select $1                             as schema_name,
       $2                             as table_name,
       split_part(option_kvp, ',', 1) as option,
       split_part(option_kvp, ',', 2) as value

    from packed_options
)

select * from unpacked_options;

$BODY$;
1

There are 1 answers

2
Laurenz Albe On BEST ANSWER

The numbers show that your strategy is not working, and the overwhelming majority of updates are not HOT. You also show the reason: Even if you update an indexed column to the original value, you won't get a HOT update.

The solution would be to differentiate by including the indexed column in the UPDATE statement only if it is really modified.

A fillfactor of 95 is also pretty high, unless you have tables with really small rows. Perhaps you would get better results with a setting like 90 or 85.