Constructing a grouped data structure with/for TBB

180 views Asked by At

Recently I've been looking at using TBB rather than boost.threads to speed up development. Generally the parallel_for works in most cases but I have a situation here that is a bit more complex.

There is an array of structs that needs computation that has been sorted according to a member variable. This is because the variables value relates to data that will be accessed during computation and grouping structs according to this will allow for cache coherency in a serial design.

#include <tbb/tbb.h>
#include <iostream>

struct thing
{
  float value_one;
  float value_two;
  unsigned int sort_id;
};

class functor
{
  thing* m_array;
public:
  functor(thing* _array) : m_array(_array) {;}
  void operator()(const tbb::blocked_range<unsigned int>& r) const
  {
    for(int i = r.begin(); i != r.end(); ++i)
    {
      //Doing a computation with array
      m_array[i].value_one = m_array[i].value_two * m_array[i].value_two;
    }
  }
};

int main(int argc, char const *argv[])
{
  unsigned int n = 10;
  thing* array = new thing[n];

  // Note the ordered id groups
  array[0].sort_id = 1;
  array[1].sort_id = 1;
  array[2].sort_id = 1;
  array[3].sort_id = 2;
  array[4].sort_id = 3;
  array[5].sort_id = 5;
  array[6].sort_id = 5;
  array[7].sort_id = 9;
  array[8].sort_id = 9;
  array[9].sort_id = 9;

  // Do something parallel with array here...
  // parallel_for(tbb::blocked_range<unsigned int>(0, n, 2), functor(array));

  delete[] array;

  return 0;
}

A simplified example is given above but in reality I'll most likely have an array of around 30-60 million elements.

I understand that parallel_for will divide the array into grouped ranges. However I would like each range to contain all the structs of a particular id. I don't mind if the range contains structs of multiple ids as long as they are sequential and contain all structs of both those ids.

int count = 0;
thing** blocks = new thing*[7];
int* sizes = new int[7];

int current_id = 0;
for(unsigned int i = 0; i < n; ++i)
{
  if(array[i].sort_id != current_id)
  {
    current_id = array[i].sort_id;
    blocks[count] = &array[i];
    sizes[count] = 1;
    ++count;
  }
  else
  {
    sizes[count - 1] += 1;
  }
}

parallel_for(tbb::blocked_range<unsigned int>(0, count, 2), functor(blocks, sizes));

Should I somehow divide the array into smaller chunks pointed to by another array that is then parallelized (as in the code directly above), and if so, what would be an efficient way to do this or is the example given optimal? Is there an alternative to parallel_for (such as task_group) that would be better suited to this problem?

1

There are 1 answers

0
Anton On BEST ANSWER

The question is still not quite clear to me since you are mixing the goal and possible approaches.

If you need to sort the array, there is parallel_sort

If you need to build an index to the sorted array where the sort_id given as a key mapped to the index where the given group of elements resides in the main array, use concurrent_unordered_map to store the groups (if there are big amount of groups) and use parallel_for to build it up.

If number of groups is less than about hundreds, you can use std::map or std::unordered_map and use parallel_reduce in order to construct partial maps and merge them to the final state.

And finally, when you'll have the right data structure with groups, you can use parallel_for across the groups as you want.

P.S. this:

grouped ranges called tasks that will be added to the stack for computation

sounds really weird to me. There is a user functor (or C++11 lambda) which can be invoked in parallel in order to process different ranges [begin;end). If you call the functor a 'task', it's ok but do not mix it with tbb::task.