Can I use Bloom Filters in searching?

Question

Can I use Bloom Filters in searching?

288 views Asked by Alex Sparrow At 13 March 2023 at 10:03

I have a single table in my database that contains 40 million user data entries. My goal is to determine whether a user is present or not, based on their unique ID.

However, if I were to provide an incorrect ID, the database would search through all 40 million entries before returning an empty result set.

To optimize this process, I am considering not checking the database at all if the user ID is not present in the database. This would help to reduce the number of unnecessary queries made to the database.

Would it be possible to use a bloom filter in this case?

Original Q&A

There are 2 answers

**AndrewR** · Answer 1 · 2023-03-16T15:54:56+00:00

Traditional approach is to build an index on data and do look ups there. That will give you sub linear lookup time (in big O notation terms).

Even with sub linear time, you still may get some meaningful gains with a bloom filter. If there are many lookup for non existing records, then a bloom filter will be a cheap approach to terminate those lookups early.

Overall, having an index and/or a bloom filter adds complexity to your system; this is where I would explore how the system performs around given requirements.

**erickson** · Answer 2 · 2023-03-16T16:09:12+00:00

If a key is added to a Bloom filter, the filter will not falsely report that it is absent in later queries. However, if a key is never added to a Bloom filter, the filter may falsely report its presence. In other words, you won't get false negatives, but you may get false positives. This probability of a false positive depends on the size of the filter. This doesn't seem like a huge problem; there is only a small chance of doing a search unnecessarily, and in the end, you get the correct result either way.

However, you say that your user IDs are stored in a "table" in a "database." If it's an indexed field in a relational database, you should be able to test for its presence in O(log(n)) time; for 40 million entries, this is on the order of 25 operations, and I would expect this time to be dwarfed by overhead like network traffic. And, with this approach, the index search can easily result in additional information, like the user record, while a Bloom filter check can't associate values with a key.

So, my recommendation would be to use something that supports a traditional index, and avoid the novelty of a Bloom filter for this. I would look for alternatives only when supported by profiling metrics that show that lookup misses are a bottleneck.

TechQA.

Can I use Bloom Filters in searching?

There are 2 answers

Related Questions in JAVA

Related Questions in DATABASE

Related Questions in SYSTEM-DESIGN

Related Questions in BLOOM-FILTER

Popular Questions

Trending Questions