What is the hash method producing 128-bit output (32 hex chars) with lowest collision likelyhood?

503 views Asked by At

I need to utilize the UUID datatype (128bit) for storing my hashes. The goal is to be able quickly calculate/identify different records by comparing it rather than millions of 1-10k char long strings. So the goal here is not the security (reverse vulnerability) but I suspect it goes hand in hand with that "uniqueness" or low collision rate that a particular hashing method offers.

I am not limited to any DB functionality here (although it's a pity DBs are usually quite behind in offering better/stronger hashing functions/algorithms, not mentioning wider (>128bit) datatypes to store UIDs) so can use any open source implementation.

So my original thinking was to use MD5 but I read somewhere it might not even utilize full potential of 128bits storing some control segments in it.

Then there's also a quite fast MurmurHash that I could use.

Lastly, I was thinking maybe, it would be better to take something solid, like SHA-3 and just take last 32chars from it?

(Where) Could I get some advice on the best (lowest collision likelyhood) between those aforementioned (and other possible) methods?

1

There are 1 answers

5
bk2204 On

If you care about collisions, you should use none of MD5, SHA-1, or MurmurHash. MurmurHash is non-cryptographic, which means collisions are expected, and MD5 and SHA-1 are broken.

Appropriate options are SHA-2 (e.g., SHA-256), SHA-3, BLAKE2, or BLAKE3. All of these are cryptographic hash functions, and all of them provide very good cryptographic security, including equally good collision resistance for a given output size. My recommendation is to use a 256-bit output, because that provides 128-bit collision resistance; a 128-bit output only provides 64-bit collision resistance, which is not very good.

If you're going with a 256-bit output, SHA-256 is fastest if your CPU accelerates it (some recent ARM and Intel CPUs and most recent AMD CPUs do), and otherwise BLAKE3 tends to be the fastest. BLAKE2b-256 is still very fast and provides slightly better security than BLAKE3. SHA-3-256 is also fine, but slower.

If you're going with a 128-bit output, you can use a truncated SHA-256, SHAKE128 (which is in the Keccak family along with the SHA-3 algorithms), BLAKE2b-128 (or BLAKE2s-128), or BLAKE3.