How do I hash integers and strings inputs using murmurhash3

1.2k views Asked by At

I'm looking to get a hash value for string and integer inputs. Using murmurhash3, I'm able to do it for strings but not integers:

pip install murmurhash3
import mmh3
mmh3.hash(34)

Returns the following error:

TypeError: a bytes-like object is required, not 'int'

I could convert it to bytes like this:

mmh3.hash(bytes(34))

But then I'll get an error message if the input is string

How do I overcome this without converting the integer to string?

1

There are 1 answers

2
ShadowRanger On

How do I overcome this without converting the integer to string?

You can't. Or more precisely, you need to convert it to bytes or str in some way, but it needn't be a human-readable text form like b'34'/'34'. A common approach on Python 3 would be:

my_int = 34  # Or some other value
my_int_as_bytes = my_int.to_bytes((my_int.bit_length() + 7) // 8, 'little')

which makes a minimalist raw bytes representation of the original int (regardless of length); for 34, you'd get b'"' (because it only takes one byte to store it, so you're basically getting a bytes object with its ordinal value), but for larger ints it still works (unlike mucking about with chr), and it's always as small as possible (getting 8 bits of data per byte, rather than a titch over 3 bits per byte as you'd get converting to a text string).

If you're on Python 2 (WHY?!? It's been end-of-life for nearly a year), int.to_bytes doesn't exist, but you can fake it with moderate efficiency in various ways, e.g. (only handling non-negative values, unlike to_bytes which handles signed values with a simple flag):

 from binascii import unhexlify

 my_int_as_bytes = unhexlify('%x' % (my_int,))