Is that possible to start the do loop and the index is from 1 to n-2 using dpc++ parallel_for?
h.parallel_for(range{lx , ly }, [=](id<2> idx
this will give a do loop from 0 to lx-1, and I have to do
idx[0]>0 && idx[1]>0 && idx[0]<lx-1 && idx[1]<ly-1
and then I can complete the loop?
Also, does dpc++ support like 4D parallel_for?
In SYCL 1.2.1,
parallel_for
supports offsets, so you could useh.parallel_for(range{lx-2, ly-2}, id{1, 1}, [=](id<2> idx){ ... });
.However, this overload has been deprecated in SYCL 2020:
So, if you want to conform to the latest standard, you should apply the offset manually:
h.parallel_for(range{lx-2, ly-2}, [=](id<2> idx0) { id<2> idx = idx0 + 1; ... });
That said, depending on your data layout, your original approach of having "empty" threads might be faster.
No. You will have to use 1D range and compute the 4D index manually.