Scan for quicksort splitting

185 views Asked by At

I'm trying to write quicksort with OpenMP, and I'm getting stuck in the splitting part. The literature says that this is a parallel prefix operation. Here's the relevant bit:

  vector<int> under_pivot(values.size()), over_pivot(values.size());
  int count_under=0, count_over=0;
#pragma omp parallel for \
        reduction(+:count_under,count_over) \
        reduction(inscan,+:under_pivot,over_pivot)
  for (int i=0; i<values.size(); i++) {
    auto v = values[i];
#   pragma omp scan inclusive(under_pivot,over_pivot)
    {
      under_pivot[i] = count_under;
      over_pivot[i]  = count_over;
    }
    count_under += 1 ? v<pivot : 0;
    count_over  += 1 ? v>pivot : 0;
  }

(Yes, I've defined the + operator for vectors). The problem seems to be the double reduction, if I understand the compiler error correctly:

quicksort.cxx:59:9: error: expected 'reduction' clause with the 'inscan' modifier
        reduction(+:count_under,count_over) \
        ^
quicksort.cxx:60:19: note: 'reduction' clause with 'inscan' modifier is used here
        reduction(inscan,+:under_pivot,over_pivot)
                  ^
1 error generated.

But I really need two reductions because the scalars are not in a scan: a scan only pertains to arrays.

Compiler explorer with full code: https://godbolt.org/z/6xPrnGjMY

1

There are 1 answers

3
Victor Eijkhout On

With thanks to @paleonix, here is the correct code:

  int count_under=0, count_over=0;
  vector<int> under_pivot(values.size(),-1), over_pivot(values.size(),-1);
#pragma omp parallel for \
        reduction(inscan,+:count_under,count_over)
  for (int i=0; i<values.size(); i++) {
    under_pivot[i] = count_under;
    over_pivot[i]  = count_over;
#   pragma omp scan exclusive(count_under, count_over)
    {
      count_under += values[i]<pivot ? 1 : 0;
      count_over  += values[i]>pivot ? 1 : 0;
    }
  }
  int s = under_pivot.size();
  count_under = under_pivot.back()
    + ( under_pivot[s-2]==under_pivot[s-1] ? 1 : 0 );

  cout << "Splitting with " << count_under << " before the pivot\n";
  cout << "under: " << under_pivot << "\n";
  cout << "over:  " << over_pivot << "\n";

  vector<float> newvalues(values.size(),-1);
  for (int i=0; i<values.size(); i++) {
    auto v = values[i];
    int loc;
    try {
      if (v<pivot) {
        loc = under_pivot[i] ;
        newvalues.at(loc) = v;
      } else {
        loc = over_pivot[i]  + count_under;
        newvalues.at(loc) = v;
      }
    } catch (...) {
      cout << "element " << i << ": " << v << " maps to loc " << loc << "\n";
    }
  }

This seems to work but there are several strange aspects. For instance, why are the reduction variables zero after the reduction loop? I tried an inclusive scan and got nothing but rubbish.

EDIT I have an extremely hackish solution for count_under which compensates for some weird border case, but that's because clang13 has a bug in scan handling. Let me find a better compiler.

Anyway, parallel quick sort, here we come.....