I'm attempting to write a method to generate an integer based on any given string. When calling this method on 2 identical strings, I need the method to generate the same exact integer both times.
I tried using .GetHasCode() however this is very unreliable once I move the project to another machine, as GetHasCode() returns different values for the same string
It is also important that the collision rate be VERY low. Custom methods I have written thus far produce collisions after just a few hundred thousand records.
The hash value MUST be an integer. A string hash value (like md5) would cripple my project in terms of speed and loading overhead.
The integer hashes are being used to perform extremely rapid text searches, which I have working beautifully, however it currently relies on .GetHasCode() and doesn't work when multiple machines get involved.
Any insight at all would be greatly appreciated.
MD5 hashing returns a byte array which could be converted to an integer:
Of course, you are converting from a 128 bit hash to a 32 bit int, so some information is being lost which will increase the possibility of collisions. You could try adjusting the second parameter to
ToInt32
to see if any specific ranges of the MD5 hash produce fewer collisions than others for your data.