moving average accuracy?

813 views Asked by At

I have some data where people vote on things, and it would be nice to have an average for each item of how everyone who has voted on it has voted. You can think of the votes as a stream of constantly incoming numbers. Now I can figure out the average exactly but to do so I have to store two numbers, either the total or the current average and the count of how many items have been seen so far. If I do this I can use

AVG[n+1] = (AVG[n]*count + item)/(count+1)

but this is a pain since it forces me to store two pieces of data for each item I want to have votes for. There is another way which I know of called the moving average or iterator average which can work on streaming data but will only give an approximate average like so:

AVG[n+1] = alpha*(item - AVG[n]) + AVG[n]

where alpha is some small fixed learning rate. This simply tries to move the new average in the direction of the new item and by an amount proportional to the difference between that new item and the current estimate. This gives me a way to only have to store one number (the current average) and still be able to update it when a new item comes in but at the cost of this only being an approximation.

I would like to know if there are any known bounds on the error this method introduces... is there a formula to estimate how far off from the truth this estimate is, and also how should I choose a good alpha? More info on this in this post and this question.

0

There are 0 answers