Hash function for Unordered associative containers in C++

476 views Asked by At

In C++ for every unordered associative containers( like unordered_map, unordered_set, unordered_multimap) we need to define a hash function. As pointed out by Wikipedia,

struct X{int i,j,k;};

struct hash_X{
  size_t operator()(const X &x) const{
    return hash<int>()(x.i) ^ hash<int>()(x.j) ^ hash<int>()(x.k);
  }
};

struct hash_X is a custom hashing function for struct X. But what does this function do? Why do we need a hash function? Can there be any other type of custom hashing functions? If so how do we compare the efficiency between any two such functions.

2

There are 2 answers

0
Joe Z On BEST ANSWER

The goal of a hashing function is to map the contents of an arbitrary data structure to an integer in such a way that most of the items you're likely to encounter map to different integers, and that the complete set of items you're likely to encounter together spreads out evenly over the set of integers. With such a function in hand, it becomes easy to build a container (such as unordered_map) that looks up arbitrary items very quickly.

I realize that definition is somewhat abstract. More concretely, consider the example you gave above from Wikipedia. It XORs the i, j and k fields of the structure together to form a hash value. This is a valid hashing function (it merged the structure down to a single integer). But, if i, j and k all have similar ranges of values, then it may not be a terribly great hashing function. For example, (1,2,3) and (3,1,2) both will hash to the same value.

An ideal hashing function usually looks more like a random number generator: For predictable inputs, it gives seemingly random outputs. (But remember, the same input must always give the same output.) The best hash function for your data structure really depends on what sort of data you'll be hashing.

This set of lecture notes looks like it covers most of the important points: http://www.cs.cornell.edu/Courses/cs312/2008sp/lectures/lec21.html

You can find others by googling.

0
Erbureth On

Short answer: To look-up the elements really fast.

As opposed to ordered containters, which store elements into some form of red-black trees (or another AVL tree), unordered ones uses indexed buckets to contain the nodes. Retrieving a bucket by index has O(1) complexity.

Hash function is a function, that takes a element and transform it into such integer index.

Consequently, as a domain of indices is smaller than domain of all the elements, collision may occur and more elements can be placed into one bucket, and thus reducing the effectiveness of element look-up. Therefore the least probability of a collision is definitely the property of a hash function to strive for. Another one should be the effectiveness of the hash computation.

See Perfect Hash Function for some more analysis