Portability of __builtin_clzll for calculating floor power of 2 and fallback options

64 views Asked by At

I have two alternative code implementations for calculating the largest power of 2 smaller than a given 64-bit unsigned integer (uint64_t):

Implementation 1:

#include <iostream>
#include <cstdint>

uint64_t floor_power_of_2(uint64_t x)
{
  x |= (x >> 1);
  x |= (x >> 2);
  x |= (x >> 4);
  x |= (x >> 8);
  x |= (x >> 16);
  x |= (x >> 32);
  return x - (x >> 1);
}

Implementation 2 (which I prefer, since is a lot faster):

#include <iostream>
#include <cstdint>

uint64_t floor_power_of_2(uint64_t x)
{
 return 1ULL << (63 - __builtin_clzll(x));//63-count of leading zeros
}

My questions are:

  1. Portability of __builtin_clzll(): How portable is Implementation 2 across different compilers, operating systems, and CPU architectures? Does it rely on specific CPU instructions that might not be available everywhere?
  2. Detecting availability and fallback: Is it possible to detect at runtime whether __builtin_clzll() is available, so that the program can use Implementation 1 as a fallback if Implementation 2 is not supported?
  3. Best practice for portability: What's the recommended approach to ensure portability while maintaining fast performance for this calculation?

I'm currently using the GCC compiler, but I've also tested on Compiler Explorer with Clang (which can compile it, and seems to do it better than gcc).

I'm interested in ensuring compatibility with a wide range of platforms.

I'd appreciate any insights or guidance on these questions.

1

There are 1 answers

0
user17732522 On BEST ANSWER

__builtin_clzll is a GCC built-in. Compilers derived from GCC and compilers that try to be compatible with GCC (such as Clang) will support it, but other compilers will not or may have their own alternative built-in.

There is no standard way to detect built-ins. They are completely up to the implementation (as everything containing double underscores). You would need to check which compiler you are compiling under. This task may be better suited to the build system, but you could use the __GNUC__ macro which is defined by both GCC and Clang.

In C++20 a standardization of __builtin_clzll is already present in the standard library as std::countl_zero and I would expect the standard library implementer to carefully implement it in the most efficient way for every platform and compiler they support.

Also, the standard library implementation works correctly on all unsigned integer types automatically and they behave predictably with input 0 which your second implementation doesn't, because according to GCC documentation it has undefined behavior for input 0. Consider what the output should be for 0.

There is also std::bit_floor which is the function that you want to implement. So you can also use that directly with defined behavior at 0.