I have this function which, given a Gray code, returns the next Gray code. You can find a more complete explanation about how it works here. The thing is that I wanted to make this increment function modular so that incrementing the Gray code corresponding to UINT_MAX
returns the Gray code corresponding to 0
(respectively the most significant bit and 0
). Since it is not the default behaviour, I added a check for this special case. Here is the complete algorithm:
unsigned next_gray(unsigned gray)
{
static const unsigned msb
= 1u << (CHAR_BITS - sizeof(unsigned) - 1u);
// gray is odd
if (__builtin_parity(gray))
{
if (__builtin_expect(gray == msb, false))
{
return 0u;
}
else
{
unsigned y = gray & -gray;
return gray ^ (y << 1u);
}
}
// gray is even
return gray ^ 1;
}
So, the actual question is actually about branch prediction. I have often read that __builtin_expect
is to be used only when a branch is really likely to be chosen or really unlikely to be chosen, the common example being for speeding a program up when there are no errors.
Considering that I am not handling an error case, I am unsure whether using __builtin_expect
for bounds checking like this is a good idea at all. Is this a good place to use __builtin_expect
or is incrementing the max value a common enough operation to deceive branch prediction?
Note: as always, comments and answers highlight things that are not clear in my questions :)
I will give a bit more context: this function is meant to be part of a library, developed for the sake of being a library and not used by any actual project as of know. Therefore, adding __builtin_expect
implies that I expect people to mostly increment other values then the maximum value; without any actual project at hand, I wish to know whether this is a safe assumption.
Taken from the GCC online docs:
You may use
__builtin_expect
to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs
), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.This all depends on your application. If the value of
gray
is uniformly distributed then it will be 1 out of(UINT_MAX+1)
, but can you say that for sure? This is why the docs recommend using-fprofile-arcs
.The gcov wikipedia article actually contains a nice simple example on how to use
-fprofile-arcs
andgcov
to get the information to make an informed decision.Update:
If you can't profile, then all things being equal the edge case
gray == msb
is very unlikely so you are probably safe with your use of__builtin_expect
. However, if you can't profile because you don't know how your library is going to be used this sounds more like pessimization than optimization. If I use your library and always pass ingray
such that it equalsmsb
than your library won't be as fast for me. General libraries that aren't written with a specific application in mind generally try to be good for the average case or without making any assumptions about input. This is why you see different implementations ofmalloc
such as jemalloc and tcmalloc. Both are optimized for very specific use cases and if you use it in a way that isn't like what it was optimized for it won't do as well. Also this blog article might be of some interest to you.