I have a single table in my database that contains 40 million user data entries. My goal is to determine whether a user is present or not, based on their unique ID.
However, if I were to provide an incorrect ID, the database would search through all 40 million entries before returning an empty result set.
To optimize this process, I am considering not checking the database at all if the user ID is not present in the database. This would help to reduce the number of unnecessary queries made to the database.
Would it be possible to use a bloom filter in this case?
Traditional approach is to build an index on data and do look ups there. That will give you sub linear lookup time (in big O notation terms).
Even with sub linear time, you still may get some meaningful gains with a bloom filter. If there are many lookup for non existing records, then a bloom filter will be a cheap approach to terminate those lookups early.
Overall, having an index and/or a bloom filter adds complexity to your system; this is where I would explore how the system performs around given requirements.