| ID | Amount | Brand |
|---|---|---|
| 1 | 10 | NULL |
| 1 | 20 | NULL |
| 2 | 30 | Mazada |
| 2 | NULL | BMW |
| 3 | 40 | NULL |
| 3 | 40 | KIA |
| 4 | NULL | Honda |
| 4 | NULL | Honda |
Above is the original table, and my goal is to find any difference within the same ID for each column.
For every ID group, if there is any difference in a value column, return a new row with the column name.
The outcome should be like this:
| ID | Difference |
|---|---|
| 1 | Amount |
| 2 | Amount |
| 2 | Brand |
| 3 | Brand |
My code for PostgreSQL:
SELECT
ID,
'Amount' AS Difference
FROM
table
GROUP BY
ID
HAVING
COUNT(DISTINCT amount) > 1
OR (COUNT(amount) != COUNT(*) AND COUNT(DISTINCT amount) > 0)
UNION ALL
SELECT
ID,
'Brand' AS Difference
FROM
table
GROUP BY
ID
HAVING
COUNT(DISTINCT brand) > 1
OR (COUNT(brand) != COUNT(*) AND COUNT(DISTINCT brand) > 0)
Be careful that there is NULL value in the record, so that the aggregation function would not count the NULL value. The NULL vs NULL view it as same, but NULL vs any value view it as difference.
Is there any better or faster query for this solution? Like lateral join or window function?
A single pass over the table should be quite a bit faster:
fiddle
Plus,
count(DISTINCT ...)is notoriously expensive. And we don't actually need that count. We just need to know if the min value differs from the max (or does not, but there is also at least oneNULL).Very similar case with more explanation:
Looks like you have seen that one, but not yet applied to your case.
Like I mentioned over there: there are faster ways for many rows per group ...