So i'm having trouble with calculating medians in T-SQL properly.
I've got a CTE that looks like this:
Medians AS (
SELECT
ParishCode AS PAR21CD,
ChangeDate AS 'Month Start',
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Listing_Price) OVER (PARTITION BY ParishCode, ChangeDate) AS Median_Rent
FROM
Joined
GROUP BY
ParishCode, ChangeDate, Listing_Price
)
This pulls from property level data, so i've got a table with lots of properties and their rental prices, and i want the median rent at Parish level.
However it looks like if one value in Listing_Price repeats (say there are two properties with the same price in the same parish and date) it only uses that value once. If there are many of the same value then that would skew the results.
Am i missing something? I know there is a MEDIAN function but i don't think we have access to it. Is there any way of including all values in the calculation?
As charlieface suggested, the problem was GROUP BY - PERCENTILE_CONT is calculated after grouping because it's an analytical function not an aggregate.
Removing the group by and adding distinct worked in my case.