I understand the unordered_ stl containers keep a number of buckets that vary in number depending on the number of elements in the container. When inserting, if breaking a certain limit, the container will rehash to use more buckets, so each bucket is less full and faster to search. And that invalidates the iterators.

This means I shouldn't keep iterators to an unordered container. Except I can, if I update them after a rehash. But I couldn't find a reliable way to check whether an insert (be it emplace or whatever) caused a rehash.

Should I monitor bucket_count()?

cppreference says Rehashing occurs only if the new number of elements is greater than max_load_factor()*bucket_count(). Is that garanteed? Would it be reliable to do the following?

bool will_rehash = (max_load_factor()*bucket_count()) > size()+1;

1 Answers

2
Community On Best Solutions

I don't think that re-hashing (as the process where actually a hash function is engaged) takes place once hash-map is growing:

  1. calculating a hash is (relatively) computationally expensive
  2. see below my example I quickly compiled, where I crafted a custom hash functor, which keeps track of times it is called:
    • whenever bucket count increases, there's no indication that hash function was called => we can infer that re-bucketing takes place instead of re-hashing

That means, that one may watch over bucket count to infer whether iterator should be invalidated (predicated the invalidation happens at the moment of re-bucketing)

#include <iostream>
#include <string>
#include <unordered_map>

using namespace std;

typedef unordered_map<string, string> str_map;

struct my_hash {
    void        reset(void) { calls_ = 0; }
    size_t      calls(void) { return calls_; }
    size_t      operator()(const string& key) const {
                 ++my_hash::calls_;
                 return hash_fn_(key);
                }
 private:
        str_map native_hash_fn_;
str_map::hasher hash_fn_{native_hash_fn_.hash_function()};
  static size_t calls_;
};

size_t my_hash::calls_ = 0;


int main ()
{
 typedef unordered_map<string, string, my_hash> myMapType;
 myMapType mymap(1, my_hash());
 myMapType::hasher hash = mymap.hash_function();

 cout << "mymap has " << mymap.bucket_count() << " buckets" << endl;

 mymap["abc1"] = "blah"; // add 3 values
 mymap["abc2"] = "blah"; // just to see the hash calls tracking
 mymap["abc3"] = "blah"; // is working
 cout << "hash calls: " << hash.calls() << endl;
 hash.reset();

 for(size_t i=0; i < 10; ++i) {
  mymap[to_string(i)] = "blah";
  cout << "buckets: " << mymap.bucket_count() << endl;
  cout << "hash calls: " << hash.calls() << endl;
  hash.reset();
 }

 cout << "mymap has " << mymap.bucket_count() << " buckets" << endl;
}

Output:

mymap has 2 buckets
hash calls: 3
buckets: 5
hash calls: 1
buckets: 11
hash calls: 1
buckets: 11
hash calls: 1
buckets: 11
hash calls: 1
buckets: 11
hash calls: 1
buckets: 11
hash calls: 1
buckets: 11
hash calls: 1
buckets: 23
hash calls: 1
buckets: 23
hash calls: 1
buckets: 23
hash calls: 1
mymap has 23 buckets

Disclaimer: though, I strongly believe that it's not a good programming practice to refer iterators after the container has changed in size. Even if it might not cause some fatal / undefined behaviors, it might (and most likely will) cause some side effects onto the programming logic. In case with the hash-map, consider a situation with a begin() iterator: after a few insertions/emplacements it won't be a true begin anymore. Even if re-bucketing did not occur, some new entries might get installed in front of it (which might affect the programming logic).