Statistical method to know when enough performance test iterations have been performed

322 views Asked by At

I'm doing some performance/load testing of a service. Imagine the test function like this:

bytesPerSecond = test(filesize: 10MB, concurrency: 5)

Using this, I'll populate a table of results for different sizes and levels of concurrency. There are other variables too, but you get the idea.

The test function spins up concurrency requests and tracks throughput. This rate starts off at zero, then spikes and dips until it eventually stabilises on the 'true' value.

However it can take a while for this stability to occur, and there are lot of combinations of input to evaluate.

How can the test function decide when it's performed enough samples? By enough, I suppose I mean that the result isn't going to change beyond some margin if testing continues.

I remember reading an article about this a while ago (from one of the jsperf authors) that discussed a robust method, but I cannot find the article any more.

One simple method would be to compute the standard deviation over a sliding window of values. Is there a better approach?

2

There are 2 answers

0
Ami Tavory On BEST ANSWER

IIUC, you're describing the classic problem of estimating the confidence interval of the mean with unknown variance. That is, suppose you have n results, x1, ..., xn, where each of the xi is a sample from some process of which you don't know much: not the mean, not the variance, and not the distribution's shape. For some required confidence interval, you'd like to now whether n is large enough so that, with high probability the true mean is within the interval of your mean.

(Note that with relatively-weak conditions, the Central Limit Theorem guarantees that the sample mean will converge to a normal distribution, but to apply it directly you would need the variance.)

So, in this case, the classic solution to determine if n is large enough, is as follows:

  • Start by calculating the sample mean μ = ∑i [xi] / n. Also calculate the normalized sample variance s2 = ∑i [(xi - μ)2] / (n - 1)

  • Depending on the size of n:

    • If n > 30, the confidence interval is approximated as μ ± zα / 2(s / √(n)), where, if necessary, you can find here an explanation on the z and α.

    • If n < 30, the confidence interval is approximated as μ ± tα / 2(s / √(n)); see again here an explanation of the t value, as well as a table.

  • If the confidence is enough, stop. Otherwise, increase n.

2
Atilla Ozgur On

Stability means rate of change (derivative) is zero or close to zero.

The test function spins up concurrency requests and tracks throughput. This rate starts off at zero, then spikes and dips until it eventually stabilises on the 'true' value.

I would track your past throughput values. For example last X values or so. According to this values, I would calculate rate of change (derivative of your throughput). If your derivative is close to zero, then your test is stable. I will stop test.

How to find X? I think instead of constant value, such as 10, choosing a value according to maximum number of test can be more suitable, for example:

 X = max(10,max_test_count * 0.01)