unrolling for loops in a special case function

934 views Asked by At

So I'm trying to optimize some code. I have a function with a variable sized loop. However for efficiency sake I want to make cases with 1, 2 and 3 sized loops special cases that are completely unrolled. My approach so far is to declare the loop size as a const parameter then define wrapper functions that call the main function handing it a literal for the const value. I've included a code snip it illustrating the kind of thing i have in mind.

inline void someFunction (const int a)
{
    for (int i=0; i<a; i++)
    {
        // do something with i.
    }
}

void specialCase()
{
    someFunction (3);
}

void generalCase(int a)
{
    someFunction (a);
}

So my question is is it reasonable for me to expect my compiler (GCC) to unroll the for loop inside of specialCase. I'm mean obviously I can copy - paste the contents of someFunction into specialCase and replace a with 3 but I'd rather only deal with one definition of someFunction in my code for clarity sake.

4

There are 4 answers

1
Vittorio Romeo On

However for efficiency sake I want to make cases with 1, 2 and 3 sized loops special cases that are completely unrolled.

Have you measured that this is actually faster? I doubt it will be (or that the compiler won't unroll the loop automatically).


My approach so far is to declare the loop size as a const parameter then define wrapper functions that call the main function handing it a literal for the const value.

const doesn't mean anything here. It won't affect the compiler's ability to unroll the loop. It just means that a cannot be mutated inside the function body, but it's still a runtime argument.


If you want to ensure unrolling, then force it. It's quite easy with C++17.

template <typename F, std::size_t... Is>
void repeat_unrolled_impl(F&& f, std::index_sequence<Is...>)
{
    (f(std::integral_constant<std::size_t, Is>{}), ...);
}

template <std::size_t Iterations, typename F>
void repeat_unrolled(F&& f)
{
    repeat_unrolled_impl(std::forward<F>(f), 
                         std::make_index_sequence<Iterations>{});
}

live example on godbolt

0
Bonita Montero On

How about this C++20 unrolling-helpers:

#pragma once
#include <utility>
#include <concepts>
#include <iterator>

template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn, size_t i ) { { fn( i ) } -> std::same_as<void>; }
inline
void unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> )
    {
        (fn( Indices ), ...);
    };
    unroll_n( std::make_index_sequence<N>() );
}

template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn ) { { fn() } -> std::same_as<void>; }
inline
void unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> )
    {
        return ((Indices, fn()), ...);
    };
    unroll_n( std::make_index_sequence<N>() );
}

template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn, size_t i ) { { fn( i ) } -> std::convertible_to<bool>; }
inline
bool unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> ) -> bool
    {
        return (fn( Indices ) && ...);
    };
    return unroll_n( std::make_index_sequence<N>() );
}

template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn ) { { fn() } -> std::convertible_to<bool>; }
inline
bool unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> ) -> bool
    {
        return ((Indices, fn()) && ...);
    };
    return unroll_n( std::make_index_sequence<N>() );
}

template<std::size_t N, typename RandomIt, typename UnaryFunction>
    requires std::random_access_iterator<RandomIt>
    && requires( UnaryFunction fn, typename std::iterator_traits<RandomIt>::value_type elem ) { { fn( elem ) }; }
inline
RandomIt unroll_for_each( RandomIt begin, RandomIt end, UnaryFunction fn )
{
    RandomIt &it = begin;
    if constexpr( N > 1 )
        for( ; it + N <= end; it += N )
            unroll<N>( [&]( size_t i ) { fn( it[i] ); } );
    for( ; it < end; ++it )
        fn( *begin );
    return it;
}

But be aware that the unrolling-factor is crucial here. Unrolling is not always beneficial and sometimes unrolling beyond the optimal CPU-specific unrolling-factor drops to the performance without unrolling.

0
geza On

I'd rather go this way, if you're willing to use the force-inline (non-standard) feature of all popular compilers:

__attribute__((always_inline))
void bodyOfLoop(int i) {
  // put code here
}

void specialCase() {
    bodyOfLoop(0);
    bodyOfLoop(1);
    bodyOfLoop(2);
}

void generalCase(int a) {
    for (int i=0; i<a; i++) {
        bodyOfLoop(i);
    }
}

Note: this is GCC/Clang solution. Use __forceinline for MSVC.

5
Richard Hodges On

If you don't like templates and don't trust your compiler, there's always this method, which is inspired by the outdated method of manually unrolling loops called "duff's device":

void do_something(int i);

void do_something_n_times(int n)
{
    int i = 0;
    switch(n)
    {
        default:
            while(n > 3) {
                do_something(i++);
                --n;
            }
        case 3: do_something(i++);
        case 2: do_something(i++);
        case 1: do_something(i++);
    }
}

But I think it's worth saying that if you don't trust your compiler to do something so simple as loop unrolling for you, it's probably time to consider a new compiler.

Note that Duff's device was originally invented as a micro-optimisation strategy for programs compiled with compilers that did not automatically apply loop-unrolling optimisations.

It was invented by Tom Duff in 1983.

https://en.wikipedia.org/wiki/Duff%27s_device

Its use with modern compilers is questionable.