Is there a way, through the information_schema
or otherwise, to calculate how many percent of each column of a table (or a set of tables, better yet) are NULL
s?
Count how many percent of values on each column are nulls
3.3k views Asked by James Brown At
4
There are 4 answers
0
On
OK, I played around a little and made a query that returns a query--or queries if you use LIKE 'my_table%'
instead of = 'my_table_name'
:
SELECT 'select '|| string_agg('(count(*)::real-count('||column_name||')::real)/count(*)::real as '||column_name||'_percentage ', ', ') || 'from ' || table_name
FROM information_schema.columns
WHERE table_name LIKE 'my_table_name'
GROUP BY table_name
It returns a ready-to-run SQL query, like:
"SELECT (count(*)::real-count(id)::real)/count(*)::real AS id_percentage , (count(*)::real-count(value)::real)/count(*)::real AS value_percentage FROM my_table_name"
id_percentage;value_percentage
0;0.0177515
(The caps didn't go exactly right for readability.)
0
On
In PostgreSQL, you can easily compute it using the statistics tables if your autovacuum setting is on (check it by SHOW ALL;). You can also set the vacuum interval to configure how fast your statistics tables should be updated. You can then compute the NULL percentage (aka, null fraction) simply using the query below:
select attname, null_frac from pg_stats where tablename = 'table_name'
Your query has a number of problems, most importantly you are not escaping identifiers (which could lead to exceptions at best or SQL injection attacks in the worst case) and you are not taking the schema into account. Use instead:
Produces a query like:
Closely related answer on dba.SE with a lot more details: