Let's say I have a table
VAL PERSON
1 1
2 1
3 1
4 1
2 2
4 2
6 2
3 3
6 3
9 3
12 3
15 3
And I'd like to calculate the quartiles for each person.
I understand I can easily calculate those for a single person as such:
SELECT
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1;
Will get me the desired results:
VAL QUARTILE
1 1
2 2
3 3
4 4
Problem is, I'd like to do this for every person. I know something like this would do the job:
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 2
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 3
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 4
But what if there's a new person on the table? Then I'd have to change the SQL code. Any suggestions?
ntile()
doesn't handle ties very well. You can easily see this with an example:which returns:
Same value. Different tiles. This gets worse if you are keeping track of which tile a value is in. Different rows can have different tiles on different runs of the same query -- even when the data does not change.
Normally, you would want rows with the same value to have the same quartile, even when the tiles are not the same size. For this reason, I recommend an explicit calculation using
rank()
instead:If you actually want values split among tiles, then use
row_number()
rather thanrank()
.