How to understand that the standard error of redis hyperloglog is 0.81%

194 views Asked by At

I am confused with hyperloglog standard error 0.81%, so I change rand() to $n+$j in https://github.com/redis/redis/blob/unstable/tests/unit/hyperloglog.tcl#L48

and change 5%->0.81% in https://github.com/redis/redis/blob/unstable/tests/unit/hyperloglog.tcl#L53

but error happens.

1

There are 1 answers

0
Lior Kogan On

The returned cardinality of the observed set is not exact, but estimated with a standard error of 0.81% of the real cardinality.

In other words, the standard deviation of the difference between the real cardinality and the estimated cardinality will be 0.81% of the real cardinality.

In simpler terms:

  • For ~68.27% of the estimates, the error is expected to be less than 0.81% (1σ)
  • For ~95.45% of the estimates, the error is expected to be less than 1.62% (2σ)
  • For ~99.73% of the estimates, the error is expected to be less than 2.43% (3σ)

5% is 6.17σ, so the error is expected to be less than 5% for more than 99.9999999% of the estimates.

Of course, some statistical assumptions, regarding both the data and the queries, apply.