When should Redis HyperLogLog be avoided and why?

689 views Asked by At

I have some basic ideas of how Redis HyperLogLog works and when to use it.
Before using it I did a test: I pfadded some consecutive numbers to an HLL entry (to mimic user ids), and Redis soon gave a false positive result. To be exact, if you pfadd number 193 to an HLL entry, number 202 would be reported already existing in that entry. You may test it in redis-cli :

127.0.0.1:6379> del ns
(integer) 0
127.0.0.1:6379> PFADD ns 193
(integer) 1
127.0.0.1:6379> PFADD ns 202
(integer) 0

I know that HyperLogLog is a probabilistic data structure, but isn't it too easy for HLL to give false positives like this? Did I misunderstand something?

1

There are 1 answers

1
Itamar Haber On
  1. You misunderstood PFADD's reply - it does not indicate existence but rather whether an internal register in the data structure had changed.
  2. You also misunderstand HLL's use - it estimates set cardinality (size), not set membership.

For a similar (in the probablistic sense) data structure that does set membership, check Bloom Filters and their Redis implementation, Rebloom (http://rebloom.io).