I came to know about HyperLogLog to count distinct elements in a large dataset. So tried using for API call responses.
- The code collects ids from the list of ids in the response.
- The list of ids are ingested to HLL to count the unique ids.
The above process is performed for each API and count the unique number of ids per user. The rate of ingestion of ids is ~15k/sec. I have configured index_bit_count = 16 and min_hash_bit_count = 48.
I am facing problems with the count from HLL which is varying upto 10% for most of the users. For some it is on higher side, while for others it is on lower side.
Can someone please suggest any config change or an alternative for this scale.