In C++ for every unordered associative containers( like unordered_map
, unordered_set
, unordered_multimap
) we need to define a hash function. As pointed out by Wikipedia,
struct X{int i,j,k;};
struct hash_X{
size_t operator()(const X &x) const{
return hash<int>()(x.i) ^ hash<int>()(x.j) ^ hash<int>()(x.k);
}
};
struct hash_X
is a custom hashing function for struct X
. But what does this function do? Why do we need a hash function? Can there be any other type of custom hashing functions? If so how do we compare the efficiency between any two such functions.
The goal of a hashing function is to map the contents of an arbitrary data structure to an integer in such a way that most of the items you're likely to encounter map to different integers, and that the complete set of items you're likely to encounter together spreads out evenly over the set of integers. With such a function in hand, it becomes easy to build a container (such as
unordered_map
) that looks up arbitrary items very quickly.I realize that definition is somewhat abstract. More concretely, consider the example you gave above from Wikipedia. It XORs the
i
,j
andk
fields of the structure together to form a hash value. This is a valid hashing function (it merged the structure down to a single integer). But, ifi
,j
andk
all have similar ranges of values, then it may not be a terribly great hashing function. For example,(1,2,3)
and(3,1,2)
both will hash to the same value.An ideal hashing function usually looks more like a random number generator: For predictable inputs, it gives seemingly random outputs. (But remember, the same input must always give the same output.) The best hash function for your data structure really depends on what sort of data you'll be hashing.
This set of lecture notes looks like it covers most of the important points: http://www.cs.cornell.edu/Courses/cs312/2008sp/lectures/lec21.html
You can find others by googling.