SYCL offers NDRange and Hierarchical kernel parallelism abstractions. My questions:
- Is it true to claim that NDRange better mapped into GPUs hardware and Hierarchical parallelism better mapped into CPUs hardware?
- Therefore, is it a realistic expectation that NDRange will achieved better performance on GPUs than Hierarchical parallelism, and on CPUs the opposite will occur?