Count how many percent of values on each column are nulls

3.3k views Asked by At

Is there a way, through the information_schema or otherwise, to calculate how many percent of each column of a table (or a set of tables, better yet) are NULLs?

4

There are 4 answers

1
Erwin Brandstetter On BEST ANSWER

Your query has a number of problems, most importantly you are not escaping identifiers (which could lead to exceptions at best or SQL injection attacks in the worst case) and you are not taking the schema into account. Use instead:

SELECT 'SELECT ' || string_agg(concat('round(100 - 100 * count(', col
                  , ') / count(*)::numeric, 2) AS ', col_pct), E'\n     , ')
    || E'\nFROM   ' ||  tbl
FROM (
   SELECT quote_ident(table_schema) || '.' || quote_ident(table_name) AS tbl
        , quote_ident(column_name) AS col
        , quote_ident(column_name || '_pct') AS col_pct
   FROM   information_schema.columns
   WHERE  table_name = 'my_table_name'
   ORDER  BY ordinal_position
   ) sub
GROUP  BY tbl;

Produces a query like:

SELECT round(100 - 100 * count(id) / count(*)::numeric, 2) AS id_pct
     , round(100 - 100 * count(day) / count(*)::numeric, 2) AS day_pct
     , round(100 - 100 * count("oDd X") / count(*)::numeric, 2) AS "oDd X_pct"
FROM   public.my_table_name;

Closely related answer on dba.SE with a lot more details:

0
newman On

Think there is not built-in features for this. You can make this self

Just walk thorough each column in table and calc count() for all rows and count() for rows where column is null.

There is possible and optimize this for one query for one table.

0
James Brown On

OK, I played around a little and made a query that returns a query--or queries if you use LIKE 'my_table%' instead of = 'my_table_name':

SELECT 'select '|| string_agg('(count(*)::real-count('||column_name||')::real)/count(*)::real as '||column_name||'_percentage ', ', ') || 'from ' || table_name
FROM information_schema.columns
WHERE table_name LIKE 'my_table_name'
GROUP BY table_name

It returns a ready-to-run SQL query, like:

"SELECT (count(*)::real-count(id)::real)/count(*)::real AS id_percentage , (count(*)::real-count(value)::real)/count(*)::real AS value_percentage FROM my_table_name"
id_percentage;value_percentage
0;0.0177515

(The caps didn't go exactly right for readability.)

0
Ali Naderi On

In PostgreSQL, you can easily compute it using the statistics tables if your autovacuum setting is on (check it by SHOW ALL;). You can also set the vacuum interval to configure how fast your statistics tables should be updated. You can then compute the NULL percentage (aka, null fraction) simply using the query below:

select attname, null_frac from pg_stats where tablename = 'table_name'