Quick Size Estimate?
I'm looking into FILLFACTOR tuning, and am consequently trying to figure out how to calculate average row sizes in Postgres. I've used this thread as a starting point:
https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-row
Not surprisingly, the most accurate methods take a long time, and I'm wondering if there's a way to get a reasonably accurate estimate quickly? And, for the purposes of FILLFACTOR, what is the best thing to measure? Seems like index and TOAST sizes don't enter into it.
What I've tried so far:
A multi-result function based on Erwin Brandstetter's detailed example from the thread cited above, named
table_get_infohere. Slow, but detailed and accurate.AVG(pg_column_size(table_name.*)), also from that thread, implemented here astable_get_row_size_estimate. Slow, but not as slow.avg(length(table_name::text)withTABLESAMPLE, implemented here astable_get_row_length_estimate. Variable speed...accuracy depends on sample/luck?
Sample Query
I know this is inefficient, that's fine for this test. Just trying to get some comparitive results.
SELECT relname,
(select bytes_per_row
from table_get_info ('data',relname)
where metric = 'core_relation_size'
) as core_relation_size,
(select bytes_per_row
from table_get_info ('data',relname)
where metric = 'live_rows_in_text_representation'
) as live_rows_in_text,
(select * from table_get_row_size_estimate('data',relname)
) as table_get_row_size,
(select * from table_get_row_length_estimate('data',relname)
) as table_get_row_length
FROM pg_stat_user_tables
WHERE relname IN (
'activity',
'analytic_productivity',
'analytic_scan',
'analytic_sterilizer_load',
'analytic_sterilizer_loadinv',
'analytic_work',
'assembly',
'data_file_info',
'inv',
'item',
'print_job',
'q_event')
order by 1;
Results
relname core_relation_size live_rows_in_text table_get_row_size table_get_row_length
activity 199 321 177 322
analytic_productivity 364 553 329 554
analytic_scan 275 401 258 402
analytic_sterilizer_load 220 379 208 380
analytic_sterilizer_loadinv 366 603 324 603
analytic_work 407 662 359 662
assembly 284 466 263 466
data_file_info 36,864 26,382 7,215 23,722
inv 324 486 281 487
item 653 966 572 967
print_job 223 309 208 304
q_event 349 611 320 612
The results for live_rows_in_text and table_get_row_length are quite similar as they're doing approximately the same thing. This is slow as Postgres has to test a lot, or all of the, rows. The estimate (far right column) uses TABLESAMPLE but it'st still pretty slow.
Is there a quick alternative tht's good enough for FILLFACTOR estimation? And, if not, what measure makes the most sense for FILLFACTOR estimation?
I've included the code for each of the functions used above next.
table_get_info
Raises the question of what exactly to check for when tuning FILLFACTOR.
CREATE OR REPLACE FUNCTION dba.table_get_info(schema_name_in text, table_name_in text)
RETURNS TABLE (
metric text,
bytes int8,
bytes_pretty text,
bytes_per_row int8
)
LANGUAGE plpgsql AS
$BODY$
DECLARE
v_schema_name text := quote_ident(schema_name_in);
v_table_name text := quote_ident(table_name_in);
-- Erwin Brandstetter
-- https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-ROW
BEGIN
RAISE NOTICE 'Table: %.%: ', v_schema_name, v_table_name;
RETURN QUERY EXECUTE
'SELECT l.metric,
l.nr AS bytes
, CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
, CASE WHEN is_size THEN nr / NULLIF(x.ct, 0) END AS bytes_per_row
FROM (
SELECT min(tableoid) AS tbl
, count(*) AS ct
, sum(length(t::text)) AS txt_len -- length in characters
FROM ' || v_schema_name || '.' || v_table_name || ' t
) x
CROSS JOIN LATERAL (
VALUES
(true , ''core_relation_size'' , pg_relation_size(tbl))
, (true , ''visibility_map'' , pg_relation_size(tbl, ''vm''))
, (true , ''free_space_map'' , pg_relation_size(tbl, ''fsm''))
, (true , ''table_size_incl_toast'' , pg_table_size(tbl))
, (true , ''indexes_size'' , pg_indexes_size(tbl))
, (true , ''total_size_incl_toast_and_indexes'', pg_total_relation_size(tbl))
, (true , ''live_rows_in_text_representation'' , txt_len)
, (false, ''row_count'' , ct)
, (false, ''live_tuples'' , pg_stat_get_live_tuples(tbl))
, (false, ''dead_tuples'' , pg_stat_get_dead_tuples(tbl))
) l(is_size, metric, nr)'
USING v_schema_name, v_table_name;
END
$BODY$;
table_get_row_size_estimate
I tried TABLESAMPLE here, and it caused no errors. Didn't speed anyting up though.
CREATE FUNCTION dba.table_get_row_size_estimate(schema_name_in text, table_name_in text)
RETURNS int8
LANGUAGE plpgsql AS
$BODY$
DECLARE
v_schema_name text := quote_ident(schema_name_in);
v_table_name text := quote_ident(table_name_in);
v_row_size_estimate real := 0;
BEGIN
RAISE NOTICE 'Table: %.%: ', v_schema_name, v_table_name;
-- SELECT AVG(pg_column_size(table_name.*)) FROM tablename; –
EXECUTE
FORMAT ('SELECT AVG(pg_column_size(' || v_schema_name || '.' || v_table_name || '.*)) FROM ' || v_table_name || ';')
USING v_schema_name, v_table_name
INTO v_row_size_estimate;
RETURN v_row_size_estimate;
END
$BODY$;
table_get_row_length_estimate
I'm trying this to get access to TABLESAMPLE. I figure 8% is a good default for a reasonable estimate.
DROP FUNCTION IF EXISTS dba.table_get_row_length_estimate(text, text, int);
CREATE FUNCTION dba.table_get_row_length_estimate(
schema_name_in text,
table_name_in text,
sample_percentage_in int default 8)
RETURNS int8
LANGUAGE plpgsql AS
$BODY$
DECLARE
v_schema_name text := quote_ident(schema_name_in);
v_table_name text := quote_ident(table_name_in);
v_row_length_estimate real := 0;
BEGIN
RAISE NOTICE 'Inputs: %.%: (%) ', v_schema_name, v_table_name, sample_percentage_in;
/*
select avg(length(activity::text)) from data.activity tablesample bernoulli(8)
*/
EXECUTE FORMAT (
' select avg(length(' || v_table_name || '::text))
from ' || v_table_name || '
tablesample bernoulli(' || sample_percentage_in || ');')
USING v_schema_name, v_table_name, sample_percentage_in
INTO v_row_length_estimate;
RETURN v_row_length_estimate;
END
$BODY$;