How to calculate quartiles grouped by?

11.6k views Asked by At

Let's say I have a table

VAL     PERSON
  1          1
  2          1
  3          1
  4          1
  2          2
  4          2
  6          2
  3          3
  6          3
  9          3
  12         3
  15         3

And I'd like to calculate the quartiles for each person.

I understand I can easily calculate those for a single person as such:

SELECT 
    VAL,
    NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1;

Will get me the desired results:

VAL    QUARTILE
1      1
2      2
3      3
4      4

Problem is, I'd like to do this for every person. I know something like this would do the job:

SELECT 
    PERSON,
    VAL,
    NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1
UNION
SELECT 
    PERSON,
    VAL,
    NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 2
UNION
SELECT 
    PERSON,
    VAL,
    NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 3
UNION
SELECT 
    PERSON,
    VAL,
    NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 4

But what if there's a new person on the table? Then I'd have to change the SQL code. Any suggestions?

2

There are 2 answers

0
Gordon Linoff On

ntile() doesn't handle ties very well. You can easily see this with an example:

select v.x, ntile(2) over (order by x) as tile
from (values (1), (1), (1), (1)) v(x);

which returns:

x tile
1   1
1   1
1   2
1   2

Same value. Different tiles. This gets worse if you are keeping track of which tile a value is in. Different rows can have different tiles on different runs of the same query -- even when the data does not change.

Normally, you would want rows with the same value to have the same quartile, even when the tiles are not the same size. For this reason, I recommend an explicit calculation using rank() instead:

select t.*,
       ((seqnum - 1) * 4 / cnt) + 1 as quartile
from (select t.*,
             rank() over (partition by person order by val) as seqnum,
             count(*) over (partition by person) as cnt
      from t
     ) t;

If you actually want values split among tiles, then use row_number() rather than rank().

1
Alvaro Parra On

Why don't you try to use partition by.

SELECT 
  PERSON,
  VAL,
  NTILE(4) OVER(PARTITION BY PERSON ORDER BY VAL) AS QUARTILE;
FROM TABLE 

Greetings