what does clustering( in the collision)in hash mean?

1.8k views Asked by At

The main problem with linear probing is clustering, many consecutive elements form groups and it starts taking time to find a free slot or to search an element.

Why consecutive element from group & how does it affect the time to find free slot?

2

There are 2 answers

0
Aki Suihkonen On

The wanted output of hash function is to scatter say 100 strings to randomly over say 200 "pigeonslots". In case of collision, ie already occupied slot the linear scan will search the next unoccupied slot, making immediately a group of at least two (it may also connect two groups). When ever there's a collision to a cluster, linear probing adds the cluster by one new key, whose original position should have been anywhere in the cluster.

Many fast to evaluate hash functions suffer from the problem of not distributing keys evenly. When also the input data is not evenly distributed, these two phenomenom emphasize each other and in case of linear probing may lead to a significant amount of keys to cluster. In effect this will make not only insertion but also searching an O(n) problem instead of O(1).

0
Sumeet On

Its not the case that always consecutive elements will form clusters.

Condictory example

Suppose you have an hash table of 100 entries: and the hash function is:

h(x) = x mod 100;

Say you insert elements:

948,748,172,973,473,572,72

The clusters formed will be:

cluster 1: 948(position 48),748(position 49)(clearly elements are not consecutive)

cluster 2: 172(position 72),973(position 73),473(position 74),572(position 75),72(position 76)(clearly elements not consecutive).

YES, clustering affects the time to find a free slot, because in linear probing, we scan the hash table to find the very next free slot, so due to clusters, linear scan will take more time due to clusters formed, but its only due to linear scan in case of collision.