I need to force the Metal compiler to unroll a loop in my kernel compute function. So far I've tried to put #pragma unroll(num_times)
before a for
loop, but the compiler ignores that statement.
It seems that the compiler doesn't unroll the loops automatically — I compared execution times for 1) a code with for
loop 2) the same code but with hand-unrolled loop. The hand-unrolled version was 3 times faster.
E.g.: I want to go from this:
for (int i=0; i<3; i++) {
do_stuff();
}
to this:
do_stuff();
do_stuff();
do_stuff();
Is there even something like loop unrolling in the Metal C++ language? If yes, how can I possibly let the compiler know I want to unroll a loop?
Metal is a subset C++11, and you can try using template metaprogramming to unroll loops. The following compiled in metal, though I don't have time to properly test it:
Please let me know if it works! You'll probably have to add some arguments to
call
to pass arguments todo_stuff
.See also: Self-unrolling macro loop in C/C++