Is this a valid PERCENTILE_CONT SQL query?

596 views Asked by At

I am trying to run a SQL query to find a 50th percentile in a table within a certain group, but then i am also grouping the result over the same field. Here is my query, for example over the tpch's nation table:

SELECT 
   regionkey,
   PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY regionkey)
FROM "tpch/nation"
GROUP BY regionkey
ORDER BY regionkey

Is this a valid query? I am trying to QA my code by running all kinds of different percentile queries. Postgres returns this result for the above query,:

 regionkey    | percentile_cont 
 -------------+-----------------
            0 |               0
            1 |               1
            2 |               2
            3 |               3
            4 |               4
 (5 rows)

But my question is in the real world, would someone would try to run such type of queries? I am new to aggregate functions, specially the percentile functions.

1

There are 1 answers

1
Gordon Linoff On BEST ANSWER

You would use percentile_cont() to get a percentage of some ordered value. For instance, if you had a population column for the region, then you would calculate the median population as:

SELECT regionkey,
       PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY population)
FROM "tpch/nation"
GROUP BY regionkey
ORDER BY regionkey;

Using regionkey in the ORDER BY is non-sensical. Within each group, regionkey has only one value. Hence, the PERCENTILE_CONT() can only return that value.

However, it can be quite useful with almost any other column.