Unnest (pivot) arrays in Postgres pg_stats view

392 views Asked by At

I would like to pivot the arrays in the cells inside the Postgres view pg_stats. The reason for this is to better understand what is going on with the table EXPLAIN ANALYZE commands.

I have this SQL query that I am modifying without success in order to achieve the pivot:

SELECT 
    -- ps.schemaname,
    -- ps.tablename,
    ps.attname as "column_name",
    ps.null_frac,
    ps.avg_width,
    ps.n_distinct,
    ps.most_common_vals,
    ps.most_common_freqs,
    ps.histogram_bounds,
    ps.correlation -- , -- I think this means how close the stats are to the "real rows in the entire table"
    -- unnest(ps.most_common_vals, ps.histogram_bounds)
FROM pg_stats AS ps
CROSS JOIN LATERAL unnest(ps.most_common_vals, ps.histogram_bounds)
WHERE 
        ps.tablename = 'my_table_name'
    AND ps.attname IN ('my_important_column_1', 'my_important_column_2');

However I hit the wall on all sorts of error messages due to the function unnest(...) (see https://www.postgresql.org/docs/9.4/functions-array.html):

  • If I use the function unnest(...) in a column inside the SELECT then I see this error:
SQL Error [42883]: ERROR: function unnest(anyarray, anyarray) does not exist
  Hint: No function matches the given name and argument types. You might need to add explicit type casts.
  Position: 325
  • If I use unnest(...) in the FROM with that CROSS JOIN LATERAL, then I see this error:
SQL Error [42804]: ERROR: cannot determine element type of "anyarray" argument

On the types issues, I am interested in these columns of various types "array" (see https://www.postgresql.org/docs/current/view-pg-stats.html):

  • pg_stats.most_common_vals type anyarray
  • pg_stats.most_common_freqs type float4[]
  • pg_stats.histogram_bounds type anyarray

If I use all of these columns inside unnest(...) then because types are different (basically because of float4[] I think), I see this error:

SQL Error [42883]: ERROR: function unnest(anyarray, real[], anyarray) does not exist
  Hint: No function matches the given name and argument types. You might need to add explicit type casts.
  Position: 387

Or

SQL Error [42804]: ERROR: cannot determine element type of "anyarray" argument

depending on where I put unnest (as column, or as filter).

  • How do I pivot all these arrays inside column values?
  • How do I use casting to apply this pivot to arrays of different types?

I am a bit lost with the Postgres syntax here.

EDIT:

I could do stuff like this in the SELECT: array_to_string(ps.most_common_vals, E'\n') as most_common_vals_lf,

But this leads to "fat cells" that don't render very well in a visual IDE e.g. DBeaver because the line feeds don't expand very well and all the output rows hide most of the cell value just because they don't expand. I really would like to pivot the view and multiply the common columns by the number N of elements in the arrays and get back all these as N rows.

1

There are 1 answers

0
TPPZ On BEST ANSWER

Turns out the issue was around casting the types of those arrays. This works when making sure the arrays types are ::text::text[]:

SELECT 
    -- ps.schemaname,
    -- ps.tablename,
    ps.attname as "column_name",
    mt.mcv AS "most_common_vals",
    mt.mcf AS "most_common_freqs",
    ps.avg_width,
    ps.n_distinct,
    ps.correlation,
    ps.null_frac -- fraction of column entries that are null
FROM pg_stats AS ps
-- for scalar type columns:
CROSS JOIN LATERAL unnest(
    ps.most_common_vals::text::text[], 
    ps.most_common_freqs::text::text[]
) AS mt(mcv, mcf)
WHERE
        ps.tablename = 'my_table_name'
    AND ps.attname IN ('my_important_column_1', 'my_important_column_2')
    AND mt.mcv IS NOT NULL
    AND mt.mcf IS NOT NULL
ORDER BY ps.attname, mt.mcf DESC;

The output is the expected pivot of the original view pg_stats with one row for each element of those arrays.

One gotcha on ps_stats: the pairs of most_common_vals and most_common_freqs can be treated as a map. So in this case it makes sense to pivot the table with both pivoted columns.

On the other hand the histogram column histogram_bounds is independent from the above. It just shows the 10 bands to go from 0% to 100% in steps of 10%. It doesn't make sense to mix this column with the 2 columns above. This histogram column very much depend on the data we are dealing with, sometimes it doesn't even make sense to take this information into consideration at all.