Use DPC++ oneAPI to improve performance

118 views Asked by At

I am new to OpenCL/OneAPI. How can I change this nested loop to use oneAPI GPU:

try {
        for (int i = 0; i < count; i++) {
            for (int j = 0; j < count; j++) {
                if (a_array[i] * a_array[j] == max) {
                    p_found = a_array[i];
                    q_found = a_array[j];
                    
                    throw "found";
                }
            }
        }
    }
    catch (...) {
        std::cout << "q = " << q_found << " and p = " << p_found << std::endl;
    }
1

There are 1 answers

0
ProjectPhysX On BEST ANSWER

Here is how an OpenCL kernel for the task would look like:

#define count 1024
#define max 1.0f
kernel void find(const global float* a_array, gloabl float* pq_found) {
    const uint n = get_global_id(0); // parallelized across nested double loop
    cosnt uint i=n/count, j=n%count;
    const float a_arrayi=a_array[i], a_arrayj=a_array[j];
    if(a_arrayi*a_arrayj==max) {
        pq_found[0] = a_arrayi;
        pq_found[1] = a_arrayj;
    }
}

Note that due to parallelization, there is a small complication: If there is exactly one hit, everything is fine. However if there is more than one hit, the result will be eiter one of the multiple hits, and it will be totally random which one it is.