Calculate percentile from a long array?

43.2k views Asked by At

Given a long array of latencies which are in milliseconds, I want to calculate percentile from them. I got below method which does the work but I am not sure how I can verify whether this gives me accurate result?

  public static long[] percentiles(long[] latencies, double... percentiles) {
    Arrays.sort(latencies, 0, latencies.length);
    long[] values = new long[percentiles.length];
    for (int i = 0; i < percentiles.length; i++) {
      int index = (int) (percentiles[i] * latencies.length);
      values[i] = latencies[index];
    }
    return values;
  }

I would like to get 50th, 95th, 99th and 99.9th percentile from latencies array.

long[] percs = percentiles(latencies, 0.5, 0.95, 0.99, 0.999);

Is this the right way to get percentile given a long array of latencies? I am working with Java 7.

4

There are 4 answers

0
ajb On

According to Wikipedia, there is no standard definition of percentile; however, they give a few possible definitions. The code you've posted appears to be closest to the Nearest Rank Method, but it's not quite the same.

The formula they give is

n = ceiling((P / 100) x N)

where N is the length of the list, P is the percentile, and n will be the ordinal rank. You've already done the division by 100. Looking at the examples they give, it's clear that the "ordinal rank" is the index in the list, but it's 1-relative. Thus, to get an index into a Java array, you'd have to subtract 1. Therefore, the correct formula should be

n = ceiling(percentile * N) - 1

Using the variables in your code, the Java equivalent would be

(int) Math.ceil(percentiles[i] * latencies.length) - 1

This is not quite the code you've written. When you cast a double to an int, the result is rounded toward 0, i.e. it's the equivalent of the "floor" function. So your code computes

floor(percentiles[i] * latencies.length)

If percentiles[i] * latencies.length is not an integer, the result is the same either way. However, if it is an integer, so that "floor" and "ceiling" are the same value, then the result will be different.

An example from Wikipedia is to compute the 40th percentile when the list is {15, 20, 35, 40, 50}. Their answer is to find the second item in the list, i.e. 20, because 0.40 * 5 = 2.0, and ceiling(2.0) = 2.0.

However, your code:

int index = (int) (percentiles[i] * latencies.length);

will result in index being 2, which isn't what you want, because that will give you the third item in the list, instead of the second.

So in order to match the Wikipedia definition, your computation of the index will need to be modified a little. (On the other hand, I wouldn't be surprised if someone comes along and says your computation is correct and Wikipedia is wrong. We'll see...)

3
user7358693 On

This is what you are looking for:

public static void main(String[] args) {
    List<Long> latencies = new List<Long>() { 3, 6, 7, 8, 8, 9, 10, 13, 15, 16, 20 };
    Collections.sort(latencies);

    System.out.println(percentile(latencies, 25));
    System.out.println(percentile(latencies, 50));
    System.out.println(percentile(latencies, 75));
    System.out.println(percentile(latencies, 100));
}

public static long percentile(List<Long> latencies, double percentile) {
    int index = (int) Math.ceil(percentile / 100.0 * latencies.size());
    return latencies.get(index-1);
}

enter image description here

0
Moshe Bixenshpaner On

If the array is sorted, you should just return the relative element in your array (e.g., p99 in a 1000-elements array is the 990th element).

If the array is unsorted, and in order to calculate percentiles more efficiently, you should probably use something like Quickselect.

0
Zsolt Safrany On
public static double percentile(double percentile, List<Double> items) {
    Preconditions.checkArgument(percentile >= 0);
    Preconditions.checkArgument(percentile <= 100);
    Preconditions.checkArgument(!items.isEmpty());

    Collections.sort(items);
    return items.get((int) Math.round(percentile / 100.0 * (items.size() - 1)));
}


@Test
public void test1() {
    List<Double> list = Arrays.asList(0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0);
    assertThat(percentile(0, list)).isEqualTo(0.0);
    assertThat(percentile(20, list)).isEqualTo(2.0);
    assertThat(percentile(80, list)).isEqualTo(8.0);
    assertThat(percentile(100, list)).isEqualTo(10.0);
}

@Test
public void test2() {
    List<Double> list = Arrays.asList(0.0, 1.0, 2.0, 3.0);
    assertThat(percentile(51, list)).isEqualTo(2.0);
    assertThat(percentile(49, list)).isEqualTo(1.0);
}

@Test
public void test3() {
    List<Double> list = Arrays.asList(42.0);     
    assertThat(percentile(0, list)).isEqualTo(42.0);
    assertThat(percentile(100, list)).isEqualTo(42.0);
}