Optimize query for columns with distinct values incl. NULL

Question

Optimize query for columns with distinct values incl. NULL

82 views Asked by CCM At 07 March 2024 at 06:36

ID	Amount	Brand
1	10	NULL
1	20	NULL
2	30	Mazada
2	NULL	BMW
3	40	NULL
3	40	KIA
4	NULL	Honda
4	NULL	Honda

Above is the original table, and my goal is to find any difference within the same ID for each column.

For every ID group, if there is any difference in a value column, return a new row with the column name.

The outcome should be like this:

ID	Difference
1	Amount
2	Amount
2	Brand
3	Brand

My code for PostgreSQL:

SELECT
    ID,
    'Amount' AS Difference
FROM
    table
GROUP BY
    ID
HAVING
    COUNT(DISTINCT amount) > 1
    OR (COUNT(amount) != COUNT(*) AND COUNT(DISTINCT amount) > 0)

UNION ALL 

SELECT
    ID,
    'Brand' AS Difference
FROM
    table
GROUP BY
    ID
HAVING
    COUNT(DISTINCT brand) > 1
    OR (COUNT(brand) != COUNT(*) AND COUNT(DISTINCT brand) > 0)

Be careful that there is NULL value in the record, so that the aggregation function would not count the NULL value. The NULL vs NULL view it as same, but NULL vs any value view it as difference.

Is there any better or faster query for this solution? Like lateral join or window function?

Original Q&A

There are 1 answers

**Erwin Brandstetter** · Answer 1 · 2024-03-07T08:55:23+00:00

A single pass over the table should be quite a bit faster:

SELECT id, col
FROM  (
   SELECT id
        , min(amount) <> max(amount) AS a_diff, bool_or(true) FILTER (WHERE amount IS NULL) AS a_null
        , min(brand)  <> max(brand)  AS b_diff, bool_or(true) FILTER (WHERE brand  IS NULL) AS b_null
   FROM   tbl
   GROUP  BY id
   ) t
JOIN   LATERAL (
   VALUES
     ('Amount', a_diff, a_null)
   , ('Brand' , b_diff, b_null)
   ) x(col, diff, has_null) ON (diff OR NOT diff AND has_null)
ORDER  BY id, col

fiddle

Plus, count(DISTINCT ...) is notoriously expensive. And we don't actually need that count. We just need to know if the min value differs from the max (or does not, but there is also at least one NULL).

Very similar case with more explanation:

Optimize query for columns with distinct values per ID

Looks like you have seen that one, but not yet applied to your case.
Like I mentioned over there: there are faster ways for many rows per group ...

TechQA.

Optimize query for columns with distinct values incl. NULL

There are 1 answers

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in POSTGRESQL-PERFORMANCE

Popular Questions

Trending Questions