Quickly estimating average row size for FILLFACTOR tuning in Postgres 13

827 views Asked by At

Quick Size Estimate?

I'm looking into FILLFACTOR tuning, and am consequently trying to figure out how to calculate average row sizes in Postgres. I've used this thread as a starting point:

https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-row

Not surprisingly, the most accurate methods take a long time, and I'm wondering if there's a way to get a reasonably accurate estimate quickly? And, for the purposes of FILLFACTOR, what is the best thing to measure? Seems like index and TOAST sizes don't enter into it.

What I've tried so far:

  • A multi-result function based on Erwin Brandstetter's detailed example from the thread cited above, named table_get_info here. Slow, but detailed and accurate.

  • AVG(pg_column_size(table_name.*)), also from that thread, implemented here as table_get_row_size_estimate. Slow, but not as slow.

  • avg(length(table_name::text) with TABLESAMPLE, implemented here as table_get_row_length_estimate. Variable speed...accuracy depends on sample/luck?

Sample Query

I know this is inefficient, that's fine for this test. Just trying to get some comparitive results.

SELECT relname,

    (select bytes_per_row
         from table_get_info ('data',relname)
        where metric = 'core_relation_size'
      ) as core_relation_size,

     (select bytes_per_row
         from table_get_info ('data',relname)
        where metric = 'live_rows_in_text_representation'
      ) as live_rows_in_text,

     (select * from table_get_row_size_estimate('data',relname)
     ) as table_get_row_size,

     (select * from table_get_row_length_estimate('data',relname)
     ) as table_get_row_length

FROM pg_stat_user_tables
WHERE relname IN  (
    'activity',
    'analytic_productivity',
    'analytic_scan',
    'analytic_sterilizer_load',
    'analytic_sterilizer_loadinv',
    'analytic_work',
    'assembly',
    'data_file_info',
    'inv',
    'item',
    'print_job',
    'q_event')

order by 1;

Results

relname          core_relation_size  live_rows_in_text table_get_row_size   table_get_row_length
activity                        199                321                177                    322
analytic_productivity           364                553                329                    554
analytic_scan                   275                401                258                    402
analytic_sterilizer_load        220                379                208                    380
analytic_sterilizer_loadinv     366                603                324                    603
analytic_work                   407                662                359                    662
assembly                        284                466                263                    466
data_file_info               36,864             26,382              7,215                 23,722
inv                             324                486                281                    487
item                            653                966                572                    967
print_job                       223                309                208                    304
q_event                         349                611                320                    612

The results for live_rows_in_text and table_get_row_length are quite similar as they're doing approximately the same thing. This is slow as Postgres has to test a lot, or all of the, rows. The estimate (far right column) uses TABLESAMPLE but it'st still pretty slow.

Is there a quick alternative tht's good enough for FILLFACTOR estimation? And, if not, what measure makes the most sense for FILLFACTOR estimation?

I've included the code for each of the functions used above next.

table_get_info

Raises the question of what exactly to check for when tuning FILLFACTOR.

CREATE OR REPLACE FUNCTION dba.table_get_info(schema_name_in text, table_name_in text)
  RETURNS TABLE (
    metric         text,
    bytes          int8,
    bytes_pretty   text,
    bytes_per_row  int8
)

LANGUAGE plpgsql AS

$BODY$

DECLARE
v_schema_name  text := quote_ident(schema_name_in);
v_table_name   text := quote_ident(table_name_in);

-- Erwin Brandstetter
-- https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-ROW

BEGIN

RAISE NOTICE 'Table: %.%: ', v_schema_name,  v_table_name;

RETURN QUERY EXECUTE

'SELECT l.metric,
        l.nr AS bytes
     , CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
     , CASE WHEN is_size THEN nr / NULLIF(x.ct, 0) END AS bytes_per_row
FROM  (
   SELECT min(tableoid)        AS tbl
        , count(*)             AS ct
        , sum(length(t::text)) AS txt_len  -- length in characters
   FROM   ' ||  v_schema_name || '.' || v_table_name ||  ' t
   ) x
CROSS  JOIN LATERAL (
   VALUES
     (true , ''core_relation_size''               , pg_relation_size(tbl))
   , (true , ''visibility_map''                   , pg_relation_size(tbl, ''vm''))
   , (true , ''free_space_map''                   , pg_relation_size(tbl, ''fsm''))
   , (true , ''table_size_incl_toast''            , pg_table_size(tbl))
   , (true , ''indexes_size''                     , pg_indexes_size(tbl))
   , (true , ''total_size_incl_toast_and_indexes'', pg_total_relation_size(tbl))
   , (true , ''live_rows_in_text_representation'' , txt_len)
   , (false, ''row_count''                        , ct)
   , (false, ''live_tuples''                      , pg_stat_get_live_tuples(tbl))
   , (false, ''dead_tuples''                      , pg_stat_get_dead_tuples(tbl))
   ) l(is_size, metric, nr)'

    USING v_schema_name, v_table_name;

END
$BODY$;

table_get_row_size_estimate

I tried TABLESAMPLE here, and it caused no errors. Didn't speed anyting up though.

CREATE FUNCTION dba.table_get_row_size_estimate(schema_name_in text, table_name_in text)
  RETURNS int8

LANGUAGE plpgsql AS

$BODY$

DECLARE
v_schema_name  text := quote_ident(schema_name_in);
v_table_name   text := quote_ident(table_name_in);

v_row_size_estimate real := 0;

BEGIN

RAISE NOTICE 'Table: %.%: ', v_schema_name,  v_table_name;

-- SELECT AVG(pg_column_size(table_name.*)) FROM tablename; –
EXECUTE
  FORMAT ('SELECT AVG(pg_column_size(' || v_schema_name || '.' || v_table_name || '.*)) FROM ' || v_table_name || ';')
  USING v_schema_name,  v_table_name
  INTO v_row_size_estimate;
RETURN v_row_size_estimate;

END
$BODY$;

table_get_row_length_estimate

I'm trying this to get access to TABLESAMPLE. I figure 8% is a good default for a reasonable estimate.

DROP FUNCTION IF EXISTS dba.table_get_row_length_estimate(text, text, int);

CREATE FUNCTION dba.table_get_row_length_estimate(
     schema_name_in       text,
     table_name_in        text,
     sample_percentage_in int default 8)

  RETURNS int8

LANGUAGE plpgsql AS

$BODY$

DECLARE
v_schema_name  text := quote_ident(schema_name_in);
v_table_name   text := quote_ident(table_name_in);

v_row_length_estimate real := 0;

BEGIN

RAISE NOTICE 'Inputs: %.%: (%) ', v_schema_name,  v_table_name, sample_percentage_in;

/*
select avg(length(activity::text)) from data.activity tablesample bernoulli(8)
*/

EXECUTE FORMAT (
  '    select avg(length(' || v_table_name || '::text))
       from ' || v_table_name || '
    tablesample bernoulli(' || sample_percentage_in || ');')
  USING v_schema_name,  v_table_name, sample_percentage_in
  INTO v_row_length_estimate;

RETURN v_row_length_estimate;

END
$BODY$;
0

There are 0 answers