Given a long array of latencies which are in milliseconds, I want to calculate percentile from them. I got below method which does the work but I am not sure how I can verify whether this gives me accurate result?
public static long[] percentiles(long[] latencies, double... percentiles) {
Arrays.sort(latencies, 0, latencies.length);
long[] values = new long[percentiles.length];
for (int i = 0; i < percentiles.length; i++) {
int index = (int) (percentiles[i] * latencies.length);
values[i] = latencies[index];
}
return values;
}
I would like to get 50th, 95th, 99th and 99.9th percentile from latencies
array.
long[] percs = percentiles(latencies, 0.5, 0.95, 0.99, 0.999);
Is this the right way to get percentile given a long array of latencies? I am working with Java 7.
According to Wikipedia, there is no standard definition of percentile; however, they give a few possible definitions. The code you've posted appears to be closest to the Nearest Rank Method, but it's not quite the same.
The formula they give is
where
N
is the length of the list,P
is the percentile, andn
will be the ordinal rank. You've already done the division by 100. Looking at the examples they give, it's clear that the "ordinal rank" is the index in the list, but it's 1-relative. Thus, to get an index into a Java array, you'd have to subtract 1. Therefore, the correct formula should beUsing the variables in your code, the Java equivalent would be
This is not quite the code you've written. When you cast a
double
to anint
, the result is rounded toward 0, i.e. it's the equivalent of the "floor" function. So your code computesIf
percentiles[i] * latencies.length
is not an integer, the result is the same either way. However, if it is an integer, so that "floor" and "ceiling" are the same value, then the result will be different.An example from Wikipedia is to compute the 40th percentile when the list is {15, 20, 35, 40, 50}. Their answer is to find the second item in the list, i.e. 20, because 0.40 * 5 = 2.0, and ceiling(2.0) = 2.0.
However, your code:
will result in
index
being 2, which isn't what you want, because that will give you the third item in the list, instead of the second.So in order to match the Wikipedia definition, your computation of the index will need to be modified a little. (On the other hand, I wouldn't be surprised if someone comes along and says your computation is correct and Wikipedia is wrong. We'll see...)