Reordering of operations and lock-free data structures

128 views Asked by At

Assume we have a Container maintaining a set of int values, plus a flag for each value indicating whether the value is valid. Invalid values are considered to be INT_MAX. Initially, all values are invalid. When a value is accessed for the first time, it is set to INT_MAX and its flag is set to valid.

struct Container {
  int& operator[](int i) {
    if (!isValid[i]) {
      values[i] = INT_MAX; // (*)
      isValid[i] = true;   // (**)
    }
    return values[i];
  }
  std::vector<int> values;
  std::vector<bool> isValid;
};

Now, another thread reads container values concurrently:

// This member is allowed to overestimate value i, but it must not underestimate it.
int Container::get(int i) {
  return isValid[i] ? values[i] : INT_MAX;
}

This is perfectly valid code, but it is crucial that lines (*) and (**) are executed in the given order.

  1. Does the standard guarantee in this case that the lines are executed in the given order? At least from a single-threaded perspective, the lines could be interchanged, couldn't they?
  2. If not, what is the most efficient way to ensure their order? This is high-performance code, so I cannot go without -O3 and do not want to use volatile.
1

There are 1 answers

4
Dietmar Kühl On BEST ANSWER

There is no synchronization here. If you access these values from one thread and change them from another thread you got undefined behavior. You'd either need a lock around all accesses in which case things are fine. Otherwise you'll need to make all your std::vector elements atomic<T> and you can control visibility of the values using the appropriate visibility parameters.

There seems to be a misunderstanding of what synchronization and in particular atomic operations do: their purpose is to make code fast! That may appear counter intuitive so here is the explanation: non-atomic operations should be as fast as possibe and there are deliberately no guarantees how they access memory exactly. As long as the compiler and execution system produce the correct results the compiler iand system are free to do whatever they need or want to do. To achieve good performance interaction between different threads are assumed to not exist.

In a concurrent system there are, however, interactions betwwen threads. This is where atomic operations enter the stage: they allow the specification of exactly the necessary synchronization needed. Thus, they allow to tell the compiler the minimal constraints it has to obey to make the thread unteraction correct. The compiler will use these indicators to generate the best possible code to achieve what is specified. That code may be identical to code not using any synchronization although in practice it is normally necessary to also prevent the CPU from reordering operations. As a result, correct use of the synchronization results in the most efficient code with only the absolutely necessary overhead.

The tricky part is to some extent finding which synchronizations are needed and to minimize these. Simply not having any will allow the compiler and the CPU to reorder operations freely and won't work.

Since the question mentioned volatile please note that volatile is entirely unrelated to concurrency! The primary purpose for volatile is to inform the system that an address points to memory whose access may have side effects. Primarily it is used to have memory mapped I/O or hardware control be accessible. Die to the potential of side effects it one of the two aspects of C++ defining the semantics of programs (the other one is I/O using standard library I/O facilities).