https://github.com/joeyrobert/bloomfilter uses Random class for the hash function which is a performance killer.
What I'm trying to do is input the class with byte[]s instead of a generic argument(T) and get rid of
private int Hash(T item) { return item.GetHashCode(); }
I know there is a huge performance benefit but I have no idea how to replace _random.Next(_bitSize)
here:
#region Public Methods
/// <summary>
/// Adds an item to the bloom filter.
/// </summary>
/// <param name="item">Item to be added</param>
public void Add(T item)
{
_random = new Random(Hash(item));
for (int i = 0; i < _numberOfHashes; i++)
_bitArray[_random.Next(_bitSize)] = true;
}
With some non-retarded line of code that doesn't take thousands of CPU cycles for every single bit.
I know there are lots of other problems with the code that can make it faster/safer.I have them(mostly) fixed and just got stuck on the last one before pushing my changes.
Any help is really appreciated.
I lack to see why you would want to use a random number generator here... however, I can help you to speed things up.
A bloom filter is basically a bitvector, where you set bits. If you want to figure out if an item exists, the bloom filter will give you a true if the item possibly exists and a false if the item for sure doesn't exist.
(I'm doing this in a simple text editor so there might be some bugs in the code)
I'm going to assume your hash space here can use 32-bit integer calculations; if you have a very large bloom table, you probably want to use a 64-bit integer.
The easiest (and probably the fastest) implementation of a bloom filter is thus:
You can experiment with changing the
byte[]
to auint[]
or aulong[]
; I'm not sure if that makes a difference.If you then want to check if an item might exist, you calculate the same index and bit, and get the result.
The only thing that remains here is the speed at which you can calculate the hash. If you're using an integer, I'd simply multiply it with a large prime number; if you're using a SHA256 fixed-length byte[] (which you seem to be doing), you need to make it an integer (or long).
I'm using a little trick here with Buffer.BlockCopy to convert the types. Just for safety I prefer using a few bytes from the data, but since SHA256 should already be random, a simple
BitConverter.ToInt32(data, [0..28])
should also do the trick.That should do it.