Universal hashing performs worse than modulo hashing, what is wrong?

Question

Universal hashing performs worse than modulo hashing, what is wrong?

297 views Asked by AdHominem At 23 December 2016 at 09:30

In case you are not familiar with universal hashing, it's mainly an attempt to guarantee a low number of collisions (as opposed, say with using plain old modulo), using some rather simple math involving randomness. The problem is it doesn't work for me:

size_t hash_modulo(const int value) {
    return (size_t) (value % TABLE_SIZE);
}

// prime 491 is used because its > 128, which is the size of the hash table
size_t hash_universal(const int value) {
    const size_t a = (size_t) (rand() % 491 + 1);
    const size_t b = (size_t) (rand() % 491);
    //printf("a: %zu, b:%zu\n", a, b);
    return ((a * value + b) % 491) % TABLE_SIZE;
}

I test modulo hashing first and determine the longest chain length (chain length means a hash bucket size):

size_t get_max_chain_length(int input[TABLE_SIZE], size_t (*hash_function)(const int)) {
    HashTable *hash_table = hash_table_create(hash_function);
    if (!hash_table) {
        return 0;
    }

    for (size_t i = 0; i < TABLE_SIZE; ++i) {
        hash_table_add(hash_table, input[i]);
    }

    size_t maximum_chain_length = 0;
    for (int j = 0; j < TABLE_SIZE; ++j) {
        const size_t length = length_of_(hash_table->rows[j]);
        maximum_chain_length = (length > maximum_chain_length) ? length : maximum_chain_length;
    }

    //hash_table_print(hash_table);
    hash_table_destroy(hash_table);

    return maximum_chain_length;
}

I pick one of the inputs which led to a really big chain (id est one which performs bad using plain modulo) and throw this one against universal hashing. Universal hashing uses randomness so I can take a constant input and still get varying results.

And here comes the problem. I try 100 random input arrays of size 128 each and calculate the average longest chain and the total longest chain, but both algorithms perform similar.

You can check my main in my repo.

My question is: Is that result to be expected? Does universal hashing perform not any better with input which already performed poor using modulo? Or did I just screw up my implementation (more likely).

Thanks a lot in advance!

Original Q&A

There are 2 answers

**SomeWittyUsername** · Answer 1 · 2016-12-23T10:01:19+00:00

SomeWittyUsername On 23 December 2016 at 10:01

Well, why do you think modulo is bad? If the input is random and sufficiently large, the modulo should yield a uniformly distributed result. Uniform hashing (as your link states) provides protection against non-random (i.e., malicious) input, which isn't the case here.

**Armali** · Answer 2 · 2019-01-17T09:33:17+00:00

In case you are not familiar with universal hashing, it's mainly an attempt to guarantee a low number of collisions …

An "attempt to guarantee" is no guarantee.

… (as opposed, say with using plain old modulo)…

The linked article says that even simpler hash functions [using modulo] are approximately universal.

I generate random input and chose one which leads to a big bucket using modulo. Then I use exactly the same input with universal to check if the result improves. And according to the specification, the maximum chain length should at least reduce a bit.

100 random input arrays of size 128 each aren't a particularly large input. I ran the program from your repo eight times; in five of those runs, your universal hashing reduced the "Average maximum chain length" by around 10 %; in three runs, your universal hashing increased the "Average maximum chain length" by a similar amount. I notice that the maximum chain length with universal hashing is constant within each run.

To sum up, there's no guarantee that one hash method is always better than another, and your universal hashing seems to keep its performance promise yet by being better more often than not.

TechQA.

Universal hashing performs worse than modulo hashing, what is wrong?

There are 2 answers

Related Questions in C

Related Questions in HASH

Related Questions in HASHTABLE

Related Questions in MODULO

Related Questions in UNIVERSAL-HASHING

Popular Questions

Popular Tags

Trending Questions