Removing a prior sample while using Welford's method for computing single pass variance

1.3k views Asked by At

I'm successfully using Welford's method to compute running variance and standard deviation as described many times on Stack Overflow and John D Cook's excellent blog post.

However in the stream of samples, sometimes I encounter a "rollback", or "remove sample" order, meaning that a previous sample is no longer valid and should be removed from the calculation. I know the value of the sample to remove and when it was processed. But I'm using Welford because I can not go back do another pass over all the data.

Is there an algorithm to successfully adjust my running variance to remove or negate a specific previously processed sample?

1

There are 1 answers

1
David Eisenstat On BEST ANSWER

Given the forward formulas

Mk = Mk-1 + (xk – Mk-1) / k
Sk = Sk-1 + (xk – Mk-1) * (xk – Mk),

it's possible to solve for Mk-1 as a function of Mk and xk and k:

Mk-1 = Mk - (xk - Mk) / (k - 1).

Then we can derive Sk-1 straightforwardly from Sk and the rest:

Sk-1 = Sk - (xk – Mk-1) * (xk – Mk).

It's not necessary that xk be the last sample here; since Mk and Sk theoretically do not depend on the order of the input, we can pretend that the sample to be removed was the last to be added.

I have no idea if this is stable.