How does unordered_set determine the inserting order in c++?

5.1k views Asked by At

I know that people use unordered_set when they don't care about the order of the elements in the set. However, when I run the sample program on C++ Shell

#include <iostream>
#include <unordered_set>
#include <string>

int main()

{
std::unordered_set<std::string> inputSet;
inputSet.insert("Hello world");
inputSet.insert("Abcdef");
inputSet.insert("This is the test string...");

for(const auto &val : inputSet)
  std::cout << val.c_str() << std::endl;

return 0;}

it gives me

This is the test string...
Abcdef
Hello world

And I tried to run it for 3 or 4 times, it still gives me the same output which implies that there is a way that unordered_set determine the inserting order.

Can someone explain how does unordered_set determine the inserting order?

Sorry if it has been asked before, I've searched online for a while and I cannot find a specific answer to this question. Thanks in advance.

4

There are 4 answers

2
WhiZTiM On BEST ANSWER

There is no specific ordering... It uses the default std::hash to hash the string. And whatever the hash value is, it is converted into an appropriate bucket index in the container..

The hash value we are talking about can be gotten:

auto hello = std::hash<std::string>()("Hello world");
auto abcd = std::hash<std::string>()("Abcdef");
auto test = std::hash<std::string>()("This is the test string...");

For a particular STL implementation, this resolves to:

Hello maps to: 14420674105493498572
abcd maps to: 10830572898531769673
test maps to: 13068738153895491918

See it Live on C++Shell

The value is usually converted to an appropriate bucket index by applying % operator. Again the std::unordered_set's iterator isn't mandated to sequentially iterate through all the buckets (what about collisions?). So, you should not rely on any ordering you observe from the iterators between program runs.


From C++14, std::hash<> is explicitly permitted to produce different results between different program runs. To quote:

Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision DoS attacks.

3
Mikel F On

As stated here http://en.cppreference.com/w/cpp/container/unordered_set

Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its value. This allows fast access to individual elements, since once a hash is computed, it refers to the exact bucket the element is placed into.

So it either uses a default or user provided hash algorithm to sort into hash buckets.

0
Dietmar Kühl On

The order in an std::unordered_set<T> is, well, unordered. However, assuming a deterministic hash is used and the same order of insert operations is done, different runs of a program will have the elements in the same order. Inserting the elements in a different order and/or using a hash which produces different values for different runs will produce a different order of the elements.

0
Gordon Bean On

I also ran into this same phenomenon. At first I thought it was a quirk of the cling compiler; however, upon further investigation, I found this answer: Complexity of std::unordered_set iterator traversal

Here it was suggested that the iteration is by reverse-bucket-creation order. I ran a test of my own and confirmed this hypothesis.

test.cpp

#include <string>
using std::string;

#include <iostream>
using std::cout;
using std::endl;

#include <unordered_set>
using std::unordered_set;

int main() {
    unordered_set<string> stuff(10);
    stuff.insert("yep");
    stuff.insert("123");
    stuff.insert("xyz");
    stuff.insert("foo");
    stuff.insert("bar");
    stuff.insert("baz");
    stuff.insert("abc");

    for (const auto& word : stuff) {
        cout << word << endl;
    }

    cout << "bucket count: " << stuff.bucket_count() << endl;

    cout << "yep: " << stuff.bucket("yep") << endl;
    cout << "123: " << stuff.bucket("123") << endl;
    cout << "xyz: " << stuff.bucket("xyz") << endl;
    cout << "foo: " << stuff.bucket("foo") << endl;
    cout << "bar: " << stuff.bucket("bar") << endl;
    cout << "baz: " << stuff.bucket("baz") << endl;
    cout << "abc: " << stuff.bucket("abc") << endl;
}
% g++ -std=c++17 -o test test.cpp && ./test
abc
baz
foo
bar
xyz
123
yep
bucket count: 11
yep: 8
123: 10
xyz: 9
foo: 5
bar: 9
baz: 6
abc: 3

As the buckets are created, they (their indexex) are (likely) pushed to the front of a singly-linked list (which iterates in reverse insertion order: 3, 6, 5, 9, 10, 8.

Note: the buckets are also SLLists (and thus iterate in reverse-insertion order; this is why bar comes before xyz in bucket 9 below).

So the iteration happens:

3)  abc
6)  baz
5)  foo
9)  bar, xyz
10) 123
8)  yep

So if you have no collisions when adding your items (and your table doesn't grow) then you will have reverse-insertion order.

But if you insert more items and the table grows (and all the items are re-inserted), then your iteration order will reflect the reverse-bucket-creation order of the re-inserted items (and my guess is the items are inserted in order of iteration; so the more you insert and the more the table grows, the obfuscated the original insertion order becomes).