Calculating median values including all values in SQL

20 views Asked by At

So i'm having trouble with calculating medians in T-SQL properly.

I've got a CTE that looks like this:

Medians AS (
    SELECT
        ParishCode AS PAR21CD,
        ChangeDate AS 'Month Start',
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Listing_Price) OVER (PARTITION BY ParishCode, ChangeDate) AS Median_Rent
    FROM
        Joined
    GROUP BY
        ParishCode, ChangeDate, Listing_Price
)

This pulls from property level data, so i've got a table with lots of properties and their rental prices, and i want the median rent at Parish level.

However it looks like if one value in Listing_Price repeats (say there are two properties with the same price in the same parish and date) it only uses that value once. If there are many of the same value then that would skew the results.

Am i missing something? I know there is a MEDIAN function but i don't think we have access to it. Is there any way of including all values in the calculation?

1

There are 1 answers

0
GlassShark1 On

As charlieface suggested, the problem was GROUP BY - PERCENTILE_CONT is calculated after grouping because it's an analytical function not an aggregate.

Removing the group by and adding distinct worked in my case.

Medians AS (
    SELECT DISTINCT
        ParishCode AS PAR21CD,
        ChangeDate AS 'Month Start',
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Listing_Price) OVER (PARTITION BY ParishCode, ChangeDate) AS Median_Rent
    FROM
        Joined
)