Loop unrolling in Metal kernels

1.3k views Asked by At

I need to force the Metal compiler to unroll a loop in my kernel compute function. So far I've tried to put #pragma unroll(num_times) before a for loop, but the compiler ignores that statement.

It seems that the compiler doesn't unroll the loops automatically — I compared execution times for 1) a code with for loop 2) the same code but with hand-unrolled loop. The hand-unrolled version was 3 times faster.

E.g.: I want to go from this:

for (int i=0; i<3; i++) {
    do_stuff();
}

to this:

do_stuff();
do_stuff();
do_stuff();

Is there even something like loop unrolling in the Metal C++ language? If yes, how can I possibly let the compiler know I want to unroll a loop?

1

There are 1 answers

0
Taylor On BEST ANSWER

Metal is a subset C++11, and you can try using template metaprogramming to unroll loops. The following compiled in metal, though I don't have time to properly test it:

template <unsigned N> struct unroll {

    template<class F>
    static void call(F f) {
        f();
        unroll<N-1>::call(f);
    }
};

template <> struct unroll<0u> {

    template<class F>
    static void call(F f) {}
};

kernel void test() {

    unroll<3>::call(do_stuff);

}

Please let me know if it works! You'll probably have to add some arguments to call to pass arguments to do_stuff.

See also: Self-unrolling macro loop in C/C++