We are collecting certain metrics using (Graphite + Grafana) use them as a tool to monitor system health and performance.
For one of the latency metric, we get the total time as well as the latencies for all the sub-components it is composed of.
We display 99th percentile for all the values. However, if we sum up the 99th percentiles for latencies of sub-components, they do not equate to the 99th percentile of the total time.
Essentially it comes down if the percentiles can follow summation rules. i.e.
if
a + b + c + d = s
then,
p99(a) + p99(b) + p99(c) + p99(d) = p99(s) ?
Will this hold?
IMHO this would be true only if |a| = |b| = |c| = |d| If this is not the cause, you should weight your equation by the number of time you pass by each component.
Imagine you have only component 'a' and 'b'. If for 100 requests passing by component 'a', 'b' is called 900 times then 0.1*p99(a) + 0.9*p99(b) = p99(a+b)
PS: you should remove your 'java' tags, and maybe 'graphite' and 'grafana' tags too.