vector copying and multi-threading : how to ensure multi-read even if occasional writes may happen?

1k views Asked by At
Pseudocode:
Function1_vector_copy () {
vectora = vectorb;
}
Function2_vector_search() {
find k in vectora;
}

The program is multi-threaded. While many threads may search , vector copying is done only by a single thread but at occasional times. The problem is whatever it may, vector_search should not fail. Vector_copy can be deferred but vector_search should not be. Its a no delay, no failure pattern. The problem is the shared variable vectora which has to be persistent, so that vector_search is not failed at all. What is an optimal way in achieving this ?

Edit :

Using an answer for another question

boost::shared_mutex _access;

Function1_vector_copy() {

  // get upgradable access
  boost::upgrade_lock lock(_access);

  // get exclusive access
  boost::upgrade_to_unique_lock uniqueLock(lock);
  // now we have exclusive access

  vectora.swap(vectorb);
}

Function2_vector_search() {

  // get shared access
  boost::shared_lock lock(_access);

  // now we have shared access

  find k in vectora ;

}

If a thread with upgradable ownership tries to upgrade whilst other threads have shared ownership, the attempt will fail and the thread will block until exclusive ownership can be acquired. --boost-documentation

The trick is the vector copy is done asynchronously and it will happen after exclusive ownership is acquired. So vector copy does eventually happen but deferred and indeed vector copy is also a non-failure operation but delayed which is okay for me. There will be a lock somehow for synchronization and we will block for a milli second atleast. But the advantage of using this swap will result in lower time of that making it negligible.

2

There are 2 answers

6
mcmcc On BEST ANSWER
Function1_vector_copy () 
{
    VectorType tmp = vectorb;

    rwLock.acquireWrite();
    swap(tmp,vectora);
    rwLock.releaseWrite();
}

The rest is left as an exercise for the reader.

UPDATE: Summarizing the comments below...

First of all I should make clear that the original swap(...) call above was intended as pseudo-code, not necessarily to be taken literally. I didn't know exactly what VectorType was (and technically still don't). I assumed that you would intend on doing as efficient a swap() as was possible. It was pseudo-code after all...

In any case, std::swap() is specialized for all container types in the C++ standard library. So if VectorType is actually std::vector<UDT>, then swap(tmp,vectora); should be both optimal and work without modification (relying on Koenig lookup to resolve the symbol).

6
DXM On

I think mcmcc's idea wasn't too far off, but instead generic std::swap() which will perform vector copy, you should be able to use std::vector::swap(). Functionally vector's swap function works identically to generic one, but it is much faster since it simply exchanges few pointer/size variables and doesn't actually do element copies. So your lock will only be around for few assembly instructions.

If that's still too long (at which point, you might want to review your design), another alternative is to have a circular list of vectors which never replace each other. Instead have another pointer which references the "active" vector. When you need to modify that collection, initialize the next vector in the circle and when initialization is done, update the pointer. With this solution, there's no locks at all. One instant your readers are using one vector, the next instant they are using a different vector.

Downside of the second solution is that it will definitely require more memory and it's a little bit unpredictable. I guess you could declare your "active" vector pointer volatile which would force the CPU to ensure cache is synchronized every time the pointer value is read, otherwise, there's some non-deterministic element where multiple CPU cores might be looking at different snapshot of main memory. Also without any kind of synchronization, how do you ensure that other threads are done with the vector which is about to be blown away? And if you do any kind of synchronization, you are back RW lock.

Although I think you could get the second alternative to work and with some tweaking relatively stably... probably. My first instinct would be to go with RW lock and vector::swap().

UPDATE: Actually, I just reread the specs and looks like STL does specialize global std::swap specifically for containers, including std::vector. So even when calling global function, complexity is O(1). mcmcc's answer works.

So we put this guessing to rest, I just wrote a little bit of code and stepped through it with debugger. Here's my application code:

std::vector< int >      a, b;

for( int i = 0; i < 10; i++ )
{
    a.push_back( i );
    b.push_back( i * 10 );
}

std::swap( a, b );

and here's what's inside the call to global, generic version of std::swap:

template<class _Ty,
class _Alloc> inline
void swap(vector<_Ty, _Alloc>& _Left, vector<_Ty, _Alloc>& _Right)
{   // swap _Left and _Right vectors
_Left.swap(_Right);
}

As you can see generic std::swap template function, which in generic mode does the copy constructor thing, has been specialized specifically for vectors. And internally it simply delegates all the work to std::vector::swap(), which as we established is a faster variant.