I was working on a problem where I am trying to take union and was trying a very simple benchmarking code to see the efficiency.
The code is very simple (inserting) few million elements into a set. To keep it simple lets keep set_union away from the discussion.
Test code :
int main() {
// Setting up
int num = 40000000;
auto it = v.begin();
std::vector<int> a;
a.reserve(num);
for (int i =0;i < num ; ++i) {
a.push_back(i);
}
// Method 1
{
std::set<int> v;
for (int i= 0 ; i< num ; ++i) {
v.insert(a[i]);
}
}
// Method 2
{
std::set<int> v;
auto it = v.begin();
for (int i= 0 ; i< num ; ++i) {
it = v.insert(it,a[i]);
}
}
// Method 3
{
std::set<int> v;
auto it = v.begin();
for (int i= 0 ; i< num ; ++i) {
it = std::next(v.insert(it,a[i]));
}
}
// Method 4
{
std::set<int> v;
auto it = v.begin();
for (int i= 0 ; i< num ; ++i) {
it = v.insert(it,i); ++it;
}
}
// Method 5 : idiomatic
{
std::set<int> v;
std::copy(a.begin(), a.end(), std::inserter(v,v.end()));
}
return 0;
}
Method 1 : slowest (as expected) : ~38 sec Method 2 : Fastest (as expected) : ~8 sec Method 3 : ~20 sec Method 4 : ~20 sec Method 5 : ~20 sec
The results make sense method 3 and 4 are the same and on digging into mehtod 5 i found that std::inserter creates an output iterator which on assignment does exactly the same thing (or translates to the same) as method 3/4.
Is this intentional ? Why can't the algorithms be written in a way that they translate to the most efficient insertion way ? Method 2 gives the exact accurate hint whereas 3,4,5 increment the iterator to the set.end() (in this case when i am inserting a sorted range , the std::next(insert(pos,new_max_element)) == set::end()) and always give that as a hint for insertion.
This makes using stl algorithms inefficient if i use std::inserter to pass an iterator to such ordered containers. on a side note : i do not understand how set_union can work in linear time if the inserting operation to another set will be logarithmic. for exmaple set_union(set_1.begin(), set_1.end(), set_2.begin(), set_2.end(), std::inserter(output_set, output_set.end()) . Sorted vectors is fine but set ? Can any one put some links or referrals to complexity analysis ?
Also it will be great if someone can explain or give some referrals to complexity analysis that proves that inserting into a set with correct hint (for example the next number is always smaller or larger then the current being inserted) gives your algorithm an amoritzed constant complexity instead of logn.