I have to create a vector of k elements. Every thread will create its portion, let's say k * 25% and must place it into the vector, at any index. Driven from this example, I was about to do something like this:
std::atomic<std::vector<aClass<templatedType>>> H;
but this won't work:
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/atomic:823:13: error: 
      _Atomic cannot be applied to type
      'std::__1::vector<StableHashFunction<int>,
      std::__1::allocator<StableHashFunction<int> > >' which is not trivially
      copyable
    mutable _Atomic(_Tp) __a_;
since it's not trivially copyable - and the workaround is not sweet. Moreover, do I need it to be atomic?
That's the question, when I do not care about the order, should I still use an atomic? In other words, do the data races affect only the order of the elements, or they can cause other side-effects, like dodging an element (so instead of k elements, to only have k - 1 in the end)?
 
                        
When you try inserting items in parallel, problems happen only when the vector must change its size: concurrent insertions would have to update the same locations in memory, causing incorrect behavior (crashes, lost items, etc.)
However, setting different elements of an existing vector concurrently does not cause a problem, so a common solution to this problem is pre-allocating the vector.
This approach requires a default constructor of vector element, and an upfront knowledge of exact number of items that each thread is going to place into the vector. For example, if you know that thread 0 is going to place 100 elements, thread 1 is going to place 120 elements, and thread 2 is going to place 110 elements, you pre-allocate a vector of 100+120+110 elements, and give each thread its own initial index: 0, 100, and 220. Now each thread can place items into the vector without running into concurrency problems.