Cassandra hashing algorithm with composite keys

2.1k views Asked by At

I'm trying to understand what algorithm Cassandra uses to generate murmur3 hashes of composite partition keys. I know I can obtain the value directly from CQL but I want to reproduce the behaviour of Cassandra for any given tuple directly from Java/scala code.

For simple partition keys the following function computes the correct value (at least in many cases, I know by looking at source code that it is not exact):

long l = com.google.common.hash.Hashing.Hashing.murmur3_128().hashString("my-string", Charset.forName("UTF-8")).asLong();

What if I have two columns on partition key ?

The hash of the concatenation of the two strings is not the same.

1

There are 1 answers

0
Nicola Ferraro On BEST ANSWER

Thanks for giving me more details about the algorithm. I wrote a sample code in order to share the solution.

byte[] keyBytes;
try(ByteArrayOutputStream bos = new ByteArrayOutputStream(); DataOutputStream out = new DataOutputStream(bos)) {    

    String[] keys = new String[] {"key1", "key2"};
    for(String key : keys) {
        byte[] arr = key.getBytes("UTF-8");
        out.writeShort(arr.length);
        out.write(arr, 0, arr.length);
        out.writeByte(0);
    }
    out.flush();
    keyBytes = bos.toByteArray();
}

long hash = Hashing.murmur3_128().hashBytes(keyBytes).asLong();