I have some basic ideas of how Redis HyperLogLog works and when to use it.
Before using it I did a test: I pfadd
ed some consecutive numbers to an HLL entry (to mimic user ids), and Redis soon gave a false positive result. To be exact, if you pfadd
number 193 to an HLL entry, number 202 would be reported already existing in that entry. You may test it in redis-cli :
127.0.0.1:6379> del ns
(integer) 0
127.0.0.1:6379> PFADD ns 193
(integer) 1
127.0.0.1:6379> PFADD ns 202
(integer) 0
I know that HyperLogLog is a probabilistic data structure, but isn't it too easy for HLL to give false positives like this? Did I misunderstand something?
PFADD
's reply - it does not indicate existence but rather whether an internal register in the data structure had changed.For a similar (in the probablistic sense) data structure that does set membership, check Bloom Filters and their Redis implementation, Rebloom (http://rebloom.io).