CUDA Thrust - Run Length Encoding with run index

499 views Asked by At

I am trying to build a "run length encoder" which produces a report of occurrences of runs within a file using CUDA Thrust. I will use this "report" to perform the run length encoding step later.

e.g. Input sequence:

inputSequence = [a, a, b, c, a, a, a];

Output sequences:

runChar = [a, a];
runCount = [2, 3];
runPosition = [0, 4];

The output desribes a run of 2 a's starting at position 0 and a run of 3 a's starting at the position 4.

The Thrust run length encoder example described below outputs two arrays - one for the output char and one for its length.

I would like to modify this so runs of less than 2 are excluded and it also outputs the position each run occurs.

 // input data on the host
    const char data[] = "aaabbbbbcddeeeeeeeeeff";

    const size_t N = (sizeof(data) / sizeof(char)) - 1;

    // copy input data to the device
    thrust::device_vector<char> input(data, data + N);

    // allocate storage for output data and run lengths
    thrust::device_vector<char> output(N);
    thrust::device_vector<int>  lengths(N);

    // print the initial data
    std::cout << "input data:" << std::endl;
    thrust::copy(input.begin(), input.end(), std::ostream_iterator<char>(std::cout, ""));
    std::cout << std::endl << std::endl;

    // compute run lengths
    size_t num_runs = thrust::reduce_by_key
                                    (input.begin(), input.end(),          // input key sequence
                                     thrust::constant_iterator<int>(1),   // input value sequence
                                     output.begin(),                      // output key sequence
                                     lengths.begin()                      // output value sequence
                                     ).first - output.begin();            // compute the output size

    // print the output
    std::cout << "run-length encoded output:" << std::endl;
    for(size_t i = 0; i < num_runs; i++)
        std::cout << "(" << output[i] << "," << lengths[i] << ")";
    std::cout << std::endl;

    return 0;
1

There are 1 answers

0
Robert Crovella On

One possible approach, building on what you have shown already:

  1. Take your output lengths, and do an exclusive_scan on them. This creates a corresponding vector of the starting indexes of each run.

  2. Use stream compaction (remove_if) to remove elements from all arrays (output, lengths, and indexes) whose corresponding length is 1. We do this in two steps, the first remove_if operation to clean up output and indexes, using lengths as the stencil, and the second operating directly on lengths. This can probably be significantly improved by operating on all 3 at once, which will make the output length calculation a bit more complicated. How you handle this exactly will depend on which sets of data you intend to retain.

Here is a fully worked example, extending your code:

$ cat t601.cu
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/reduce.h>
#include <thrust/scan.h>
#include <thrust/iterator/constant_iterator.h>
#include <thrust/iterator/zip_iterator.h>

struct is_not_one{

template <typename T>
   __host__ __device__
   bool operator()(T data){
     return data != 1;
   }
};

int main(){

// input data on the host
    const char data[] = "aaabbbbbcddeeeeeeeeeff";

    const size_t N = (sizeof(data) / sizeof(char)) - 1;

    // copy input data to the device
    thrust::device_vector<char> input(data, data + N);

    // allocate storage for output data and run lengths
    thrust::device_vector<char> output(N);
    thrust::device_vector<int>  lengths(N);

    // print the initial data
    std::cout << "input data:" << std::endl;
    thrust::copy(input.begin(), input.end(), std::ostream_iterator<char>(std::cout, ""));
    std::cout << std::endl << std::endl;

    // compute run lengths
    size_t num_runs = thrust::reduce_by_key
                                    (input.begin(), input.end(),          // input key sequence
                                     thrust::constant_iterator<int>(1),   // input value sequence
                                     output.begin(),                      // output key sequence
                                     lengths.begin()                      // output value sequence
                                     ).first - output.begin();            // compute the output size

    // print the output
    std::cout << "run-length encoded output:" << std::endl;
    for(size_t i = 0; i < num_runs; i++)
        std::cout << "(" << output[i] << "," << lengths[i] << ")";
    std::cout << std::endl;

    thrust::device_vector<int> indexes(num_runs);
    thrust::exclusive_scan(lengths.begin(), lengths.begin()+num_runs, indexes.begin());
    thrust::device_vector<char> foutput(num_runs);
    thrust::device_vector<int>  findexes(num_runs);
    thrust::device_vector<int>  flengths(num_runs);
    thrust::copy_if(thrust::make_zip_iterator(thrust::make_tuple(output.begin(), indexes.begin())), thrust::make_zip_iterator(thrust::make_tuple(output.begin()+num_runs, indexes.begin()+num_runs)), lengths.begin(), thrust::make_zip_iterator(thrust::make_tuple(foutput.begin(), findexes.begin())), is_not_one());
    size_t fnum_runs = thrust::copy_if(lengths.begin(), lengths.begin()+num_runs, flengths.begin(), is_not_one()) - flengths.begin();
    std::cout << "output: " << std::endl;
    thrust::copy_n(foutput.begin(), fnum_runs, std::ostream_iterator<char>(std::cout, ","));
    std::cout << std::endl << "lengths: " << std::endl;
    thrust::copy_n(flengths.begin(), fnum_runs, std::ostream_iterator<int>(std::cout, ","));
    std::cout << std::endl << "indexes: " << std::endl;
    thrust::copy_n(findexes.begin(), fnum_runs, std::ostream_iterator<int>(std::cout, ","));
    std::cout << std::endl;

    return 0;

}
$ nvcc -arch=sm_20 -o t601 t601.cu
$ ./t601
input data:
aaabbbbbcddeeeeeeeeeff

run-length encoded output:
(a,3)(b,5)(c,1)(d,2)(e,9)(f,2)
output:
a,b,d,e,f,
lengths:
3,5,2,9,2,
indexes:
0,3,9,11,20,
$

I'm certain that this code can be improved upon, but my purpose is to show you one possible general approach.

In my opinion, for future reference, it's not very helpful for you to strip off the include headers from your sample code. I think it's better to provide a complete, compilable code. Not a big deal in this case.

Also note that there are thrust example codes for run length encoding and decoding.