The main problem with linear probing is clustering, many consecutive elements form groups and it starts taking time to find a free slot or to search an element.
Why consecutive element from group & how does it affect the time to find free slot?
The main problem with linear probing is clustering, many consecutive elements form groups and it starts taking time to find a free slot or to search an element.
Why consecutive element from group & how does it affect the time to find free slot?
Its not the case that always consecutive elements will form clusters.
Condictory example
Suppose you have an hash table of 100
entries: and the hash function is:
h(x) = x mod 100;
Say you insert elements:
948,748,172,973,473,572,72
The clusters formed will be:
cluster 1: 948(position 48),748(position 49)
(clearly elements are not consecutive)
cluster 2: 172(position 72),973(position 73),473(position 74),572(position 75),72(position 76)
(clearly elements not consecutive).
YES, clustering affects the time to find a free slot, because in linear probing
, we scan the hash table to find the very next free slot, so due to clusters, linear scan will take more time due to clusters formed, but its only due to linear scan in case of collision.
The wanted output of hash function is to scatter say 100 strings to randomly over say 200 "pigeonslots". In case of collision, ie already occupied slot the linear scan will search the next unoccupied slot, making immediately a group of at least two (it may also connect two groups). When ever there's a collision to a cluster, linear probing adds the cluster by one new key, whose original position should have been anywhere in the cluster.
Many fast to evaluate hash functions suffer from the problem of not distributing keys evenly. When also the input data is not evenly distributed, these two phenomenom emphasize each other and in case of linear probing may lead to a significant amount of keys to cluster. In effect this will make not only insertion but also searching an O(n) problem instead of O(1).