Is there any Delphi implementation of MurMurHash 3? I tried implementing it myself, but my implementation is actually slower that the MurMurHash2. Is it normal? Is there any other implementation?
This is mine:
function MurMur3_32(const S: AnsiString; const Seed: LongWord=$9747b28c): LongWord;
const
c1 = $cc9e2d51;
c2 = $1b873593;
r1 = 15;
r2 = 13;
m = 5;
n = $e6546b64;
var
hash: LongWord;
len: LongWord;
k, k2: LongWord;
data: Integer;
begin
Hash := Seed;
len := Length(S);
//The default seed, $9747b28c, is from the original C library
// Initialize the hash to a 'random' value
hash := seed xor len;
// Mix 4 bytes at a time into the hash
data := 1;
while(len >= 4) do
begin
k := PLongWord(@S[data])^;
k := k*c1;
k := (k shl r1) or (k shr (32 - r1));
k := k*c2;
hash := hash xor k;
hash := ((hash shl r2) or (hash shr (32 - r2))) * m + n;
Inc(Data, 4);
Dec(len, 4);
end;
k2 := 0;
{ Handle the last few bytes of the input array
S: ... $69 $18 $2f
}
Assert(len <= 3);
if len = 3 then
k2 := k2 xor (LongWord(s[data+2]) shl 16);
if len >= 2 then
k2 := k2 xor (LongWord(s[data+1]) shl 8);
if len >= 1 then
begin
k2 := k2 xor (LongWord(s[data]));
k2 := k2 * c1;
k2 := (k2 shl r1) or (k2 shr (32 - r1));
k2 := k2 * c2;
hash := hash xor k2;
end;
// Do a few final mixes of the hash to ensure the last few
// bytes are well-incorporated.
len := Length(S);
hash := hash xor len;
hash := hash xor (hash shr 16);
hash := hash * $85ebca6b;
hash := hash xor (hash shr 13);
hash := hash * $c2b2ae35;
hash := hash xor (hash shr 16);
Result := hash;
end;
Disclaimer: I don't know if Seed
value is correct.
I have a murmurhash3 implementation in my FastDefaults unit at https://github.com/JBontes/FastCode
Here's the sourcecode for Murmurhash3:
The goto's are in place to simulate C's fallthough switch/case statement.
This kind of code is easier to write in asm, which has better register usage than the asm Delphi generates.
Why is your code slow
he only place I can see a problem is here:
Three indirect memory references and the read is misaligned!!
See here for more info: Purpose of memory alignment never mind the accepted answer, see the second answer.
A read of a double word should always happen on a dword boundary.
The
@pointer^
trick makes the compiler to add an extra level of indirection (and an extra round-trip to memory (Oops)).Use
{$pointermath on}
and address the pointer as an array.integer != pointer
Also it's an error to use
integer
to store a pointer.It will break in X64. Use NativeUInt instead.
The equivalent code in my version translates to:
Note that there are
noless unneeded indirect memory references and reads are properly aligned.Of course the asm version is slightly better, but the difference between the asm and pascal version should be less than between the two Pascal versions.
This is where I think your performance is wasted.
The misaligned reads eat up a lot of (wasted) cycles on X86.
On other processors the slowdown is even worse.
Other issues with the code
Limiting your implementation to strings is a bad thing to do.
What if I want to hash a record?
Do not force binary data into strings.
Strings are a bad fit for binary data.
Just use an untyped parameter and pointers.
Seed values
I don't think there is a correct seed value. The way I understood it the seed is there so you can chain calls to Murmurhash.
You use the output of the first hash as the seed for the second run.
That way you can feed 2 variables in and get the same output as if you processed those two variables in one go.
Let me know how they perform against your code.