I am trying to run a SQL query to find a 50th percentile in a table within a certain group, but then i am also grouping the result over the same field. Here is my query, for example over the tpch's nation table:
SELECT
regionkey,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY regionkey)
FROM "tpch/nation"
GROUP BY regionkey
ORDER BY regionkey
Is this a valid query? I am trying to QA my code by running all kinds of different percentile queries. Postgres returns this result for the above query,:
regionkey | percentile_cont
-------------+-----------------
0 | 0
1 | 1
2 | 2
3 | 3
4 | 4
(5 rows)
But my question is in the real world, would someone would try to run such type of queries? I am new to aggregate functions, specially the percentile functions.
You would use
percentile_cont()to get a percentage of some ordered value. For instance, if you had apopulationcolumn for the region, then you would calculate the median population as:Using
regionkeyin theORDER BYis non-sensical. Within each group,regionkeyhas only one value. Hence, thePERCENTILE_CONT()can only return that value.However, it can be quite useful with almost any other column.