I'm successfully using Welford's method to compute running variance and standard deviation as described many times on Stack Overflow and John D Cook's excellent blog post.
However in the stream of samples, sometimes I encounter a "rollback", or "remove sample" order, meaning that a previous sample is no longer valid and should be removed from the calculation. I know the value of the sample to remove and when it was processed. But I'm using Welford because I can not go back do another pass over all the data.
Is there an algorithm to successfully adjust my running variance to remove or negate a specific previously processed sample?
Given the forward formulas
it's possible to solve for
Mk-1
as a function ofMk
andxk
andk
:Then we can derive
Sk-1
straightforwardly fromSk
and the rest:It's not necessary that
xk
be the last sample here; sinceMk
andSk
theoretically do not depend on the order of the input, we can pretend that the sample to be removed was the last to be added.I have no idea if this is stable.