How to iterate over all possible executions of a program?

155 views Asked by At

How to iterate over all possible executions of a program?

The C++ standard says that the behavior of a program is undefined if at least one of all possible executions of this program has a data race.

How does one get (or iterate over) all possible executions of a program for a given input?
Is there an algorithm to do this?

Right now this task seems very complicated to me.
The main problem is that a read in one thread might return a future write to this variable from another thread.
But at the time we evaluate the read, we don't know what writes will happen in another threads in the future.
More than that, what future writes will happen in other threads may depend on the value returned by this read.

2

There are 2 answers

3
JohnFilleau On

There are no tools or algorithms you can use to iterate over all possible executions of a generic C++ program.

7
darclander On

As many others have stated in the comments this is not a trivial problem and depending on what you actually want this can turn very complex.

Static program analysis (with pi-calculus)
So I am not very familiar with this area but I have tried my best to summarize what I think I understand. Please if there are things to improve suggest an edit. If we have a simple program with a data race such as:

#include <iostream>
#include <thread>

int sharedVariable = 0;

void threadFunction1() {
    for (int i = 0; i < 1000000; ++i) {
        sharedVariable += 1;
    }
}

void threadFunction2() {
    for (int i = 0; i < 1000000; ++i) {
        sharedVariable -= 1;
    }
}

int main() {
    std::thread t1(threadFunction1);
    std::thread t2(threadFunction2);

    t1.join();
    t2.join();

    std::cout << "Final sharedVariable value: " << sharedVariable << std::endl;

    return 0;
}

We will have different outputs of the sharedVariable each execution. By using pi-calculus (my poor example) we can represent the workflow of the program such as:

Process P = sharedVariable[sharedVariable + 1]
Process Q = sharedVariable[sharedVariable - 1]

P, Q :: = | sharedVariable[0]   // Initialize the shared variable
          | P | Q               // Run P and Q in parallel 
          | 0                   // Terminate the program

We can determine that there will be a data race because we are running two processes in parallel that are modifying the same value. The issue is still the complexity and perhaps the tools which can identify these races? Perhaps an idea would be to make an interpreter for reading (manually) created pi-calculus syntax?

Dynamic race detection
From my understanding the thing you are looking for are methods to discover if a given program contains ANY data races. There are interesting articles (although a bit old) on the topic, for example: https://dl.acm.org/doi/abs/10.1145/1147403.1147416.

The authors claim that you can use algorithm four to detect IF there is at least one data race for a memory location x although it might not find all of them. I quote:

At any point during execution of the program, let Sw denote the last segment to write x and Sr the last segment to read x after Sw , if any. Let R denote a set that consists of Sr and each read-segment between Sw and Sr , that is parallel to Sr . We keep track of the segment Sw and the set R, but not the segment Sr . Initially, Sw is undefined and R is empty. Treat each notation of the form S′ ‖ S used below as an abbreviation of the clause: S′ is defined and parallel to S. been implicit. Now we describe an algorithm where we al- ways keep the last segment to write x and the last segment to read x for each active thread. Like Algorithm 4, this algorithm is able to detect at least one race if there are races. The storage requirements are different however; we comment on that issue in Remark 18 below. First, we need some definitions and results. We say that there is a current flow conflict in x between two segments Si and Sj , where 1 ≤ i < j ≤ n, if at some given point, Si is the last segment to write x on its thread, Sj is the last segment to read x on its thread, and Si ‖ Sj . Current conflicts of the other three types are defined in a similar way. A current race in x is a current conflict of type flow, anti, or output. Our algorithm in this subsection is based on the following theorem. Theorem 17. If there is an access conflict in x of a given type, then there must be a current conflict in x of the same type. Proof. The particular type of conflict chosen is irrelevant here. For definiteness, assume that there is a flow conflict in x between two segments Si and Sj such that 1 ≤ i < j ≤ n. Focus on the moment when Sj becomes the last segment to read x on its thread. By hypothesis, Si has already written x. If it is currently the last segment to write x on its thread, then we have a current flow conflict, and there is nothing left to prove. Assume then that Si is not currently the last segment to write x on its thread. Let Sq denote the last segment to write x on the thread of Si, when Sj is the last segment to read x on the thread of Sj . Note that we have i < q < j. By hypothesis, Si and Sj are parallel. Lemma 6 then implies that either the segments Si and Sq are parallel, or the segments Sq and Sj are parallel. The first possibility is ruled out since Si and Sq belong to the same thread. Hence, Sq and Sj are parallel. These two segments cause a current conflict in x.