Murmurhash2 Unsigned Int overflow

777 views Asked by At

I'm currently trying to implement a hashtable/trie, but when I pass in parameters to murmurhash2, it gives back a number but I get run time errors of unsigned int overflow:

test.c:53:12: runtime error: unsigned integer overflow: 24930 * 1540483477 cannot be represented in type 'unsigned int'

test.c:60:4: runtime error: unsigned integer overflow: 2950274797 * 1540483477 cannot be represented in type 'unsigned int' 6265

I've put a bunch of stars(*) on the lines 53 and 60

I'm not sure if I'm passing some parameters wrong. Any help would be greatly appreciated!

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed );

int main(void)
{
   const char* s= "aa";
   unsigned int number= MurmurHash2 (s, (int)strlen(s), 1) % 10000;
   printf("%u\n", number);
}

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.

const unsigned int m = 0x5bd1e995;
const int r = 24;

// Initialize the hash to a 'random' value

unsigned int h = seed ^ len;

// Mix 4 bytes at a time into the hash

const unsigned char * data = (const unsigned char *)key;

while(len >= 4)
{
    unsigned int k = *(unsigned int *)data;

    k *= m;
    k ^= k >> r;
    k *= m;

    h *= m;
    h ^= k;

    data += 4;
    len -= 4;
}

// Handle the last few bytes of the input array

switch(len)
{
case 3: h ^= data[2] << 16;
case 2: h ^= data[1] << 8;
case 1: h ^= data[0];
        h *= m; ************************************************
};

// Do a few final mixes of the hash to ensure the last few
// bytes are well-incorporated.

h ^= h >> 13;
h *= m;   **************************************
h ^= h >> 15;

return h;
}
2

There are 2 answers

0
nwellnhof On BEST ANSWER

It seems that you're building with the UBSan option -fsanitize=unsigned-integer-overflow or some other option like -fsanitize=integer that enables this check. The documentation says:

Note that unlike signed integer overflow, unsigned integer is not undefined behavior. However, while it has well-defined semantics, it is often unintentional, so UBSan offers to catch it.

In the case of MurmurHash, unsigned integer overflow in multiplications is fully intentional, so you should disable the option.

  • If you use -fsanitize=unsigned-integer-overflow explicitly, remove it.
  • If it is enabled by another option, pass -fno-sanitize=unsigned-integer-overflow.
  • Alternatively, annotate the function MurmurHash2 with __attribute__((no_sanitize("unsigned-integer-overflow"))).

Another note: Your code seems to be copied from the 32-bit reference implementation of MurmurHash2 which assumes 32-bit ints. You should consider using uint32_t instead.

6
Myst On

unsigned int has a system dependent number of bits.

On most systems, this number is 32 bits (4 bytes) but some systems might use different sizes (i.e. 64bits (8 bytes) on some machines).

However, murmur hash "words" are a specific size. The 64 bit variant requires a 64bit unsigned type and the 32 bit variant requires a 32bit unsigned type.

This inconsistency can be resolved by using the uint64_t or uint32_t types defined in <stdint.h>.

I would add that the suffix UL (unsigned long) should probably be added to any numerical constants you use. i.e. 2950274797UL * 1540483477UL.

As indicated by @nwellnhof, your code seems to use the 32 bit variant of the algorithm.

The overflow in the multiply instruction is normal in these cases (where the result larger than the number of available bits and is truncated). This loss of data is acceptable as part of the hashing process.

Consider informing the compiler about the expected result using:

 h = (uint32_t)(((uint64_t)h * m) & 0xFFFFFFFF)

Good Luck!