std set insert and union efficient way

1.3k views Asked by At

I was working on a problem where I am trying to take union and was trying a very simple benchmarking code to see the efficiency.

The code is very simple (inserting) few million elements into a set. To keep it simple lets keep set_union away from the discussion.

Test code :

int main() {
  // Setting up
    int num = 40000000;
  auto it = v.begin();
  std::vector<int> a;
  a.reserve(num);
  for (int i =0;i < num ; ++i) {
    a.push_back(i);
  }

  // Method 1
  { 
    std::set<int> v;    
    for (int i= 0 ; i< num ; ++i) {
       v.insert(a[i]);
    }
  }

  // Method 2
  { 
    std::set<int> v;
    auto it = v.begin();
    for (int i= 0 ; i< num ; ++i) {
       it = v.insert(it,a[i]);
    }
  }  

  // Method 3
  { 
    std::set<int> v;
    auto it = v.begin();
    for (int i= 0 ; i< num ; ++i) {
       it = std::next(v.insert(it,a[i]));
    }
  }

  // Method 4
  { 
    std::set<int> v;
    auto it = v.begin();
    for (int i= 0 ; i< num ; ++i) {
       it = v.insert(it,i); ++it;
    }
  }    

  // Method 5 : idiomatic
  {
    std::set<int> v;
    std::copy(a.begin(), a.end(), std::inserter(v,v.end()));
  }
 return 0;
}

Method 1 : slowest (as expected) : ~38 sec Method 2 : Fastest (as expected) : ~8 sec Method 3 : ~20 sec Method 4 : ~20 sec Method 5 : ~20 sec

The results make sense method 3 and 4 are the same and on digging into mehtod 5 i found that std::inserter creates an output iterator which on assignment does exactly the same thing (or translates to the same) as method 3/4.

Is this intentional ? Why can't the algorithms be written in a way that they translate to the most efficient insertion way ? Method 2 gives the exact accurate hint whereas 3,4,5 increment the iterator to the set.end() (in this case when i am inserting a sorted range , the std::next(insert(pos,new_max_element)) == set::end()) and always give that as a hint for insertion.

This makes using stl algorithms inefficient if i use std::inserter to pass an iterator to such ordered containers. on a side note : i do not understand how set_union can work in linear time if the inserting operation to another set will be logarithmic. for exmaple set_union(set_1.begin(), set_1.end(), set_2.begin(), set_2.end(), std::inserter(output_set, output_set.end()) . Sorted vectors is fine but set ? Can any one put some links or referrals to complexity analysis ?

Also it will be great if someone can explain or give some referrals to complexity analysis that proves that inserting into a set with correct hint (for example the next number is always smaller or larger then the current being inserted) gives your algorithm an amoritzed constant complexity instead of logn.

0

There are 0 answers