Sampling conditional distribution OpenMP

93 views Asked by At

I have a function which draws a random sample:

Sample sample();

I have a function which checks weather a sample is valid:

bool is_valid(Sample s);

This simulates a conditional distribution. Now I want a lot of valid samples (most samples will not be valid).

So I want to parallelize this code with openMP

vector<Sample> valid_samples;
while(valid_samples.size() < n) {
    Sample s = sample();
    if(is_valid(s)) {
        valid_samples.push_back(s);
    }
}

How would I do this? Most of the OpenMP code I found were simple for loops where the number of iterations is determined in the beginning.

The sample() function has a

thread_local std::mt19937_64 gen([](){
    std::random_device d;
    std::uniform_int_distribution<int> dist(0,10000);
    return dist(d);
}());

as a random number generator. Is is valid and thread save if I assume that my device has a source of randomness? Are there better solutions?

2

There are 2 answers

6
Daniel Langr On BEST ANSWER

You may employ OpenMP task parallelism. The simplest solution would be to define a task as a single sample insertion:

vector<Sample> valid_samples(n); // need to be resized to allow access in parallel

void insert_ith(size_t i)
{
  do {
    valid_samples[i] = sample();
  } while (!is_valid(valid_samples[i]));
}

#pragma omp parallel
{
  #pragma omp single
  {
    for (size_t i = 0; i < n; i++) 
    {
      #pragma omp task
      insert_ith(i);
    }
  }
}

Note that there might be performance issues with such single-task-single-insertion mapping. First, there would be false sharing involved, but likely worse, tasking management has some overhead which might be significant for very small tasks. In such a case, a remedy is simple — instead of a single insertion per tasks, insert multiple items at once, such as 100. Usually, a suitable number is a trade-off: the lower creates more tasks = more overhead, the higher may result in worse load balancing.

10
schetefan24 On

You need to take care of the critical section in your code, which is the insertion into the answer vector

something like this should work (haven't compiled because functions and types are not given)

// create vector before parallel section because it shall be shared
vector<Sample> valid_samples(n); // set initial size to avoid reallocation
int reached_count = 0;
#pragma omp parallel shared(valid_samples, n, reached_count)
{
    while(reached_count < n) { // changed this, see comments for discussion
        Sample s = sample(); // I assume this to be thread indepent
        if(is_valid(s)) {
            #pragma omp critical
            {
                // check condition again, another thread might have already
                // reached maximum number
                if(reached_count < n) {
                    valid_samples.push_back(s);
                    reached_count = valid_samples.size();
                }
            }
        }
    }
}

note that neither sample() nor isvalid(s) are inside of the critical section because I assume these functions to be far more expensive than the vector insertion or the acceptance is very rare

If that is not the case, you could work with indepent local vectors and merge in the end, but that would only gain a significant benefit if you reduce the number of synchronization in some way, like giving a fixed number of iterations (at least for a large part)