Bit Twiddling Hacks contains the following macros, which count the number of bytes in a word x
that are less than, or greater than, n
:
#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
#define countmore(x,n) \
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)
However, it doesn't explain why they work. What's the logic behind these macros?
Let's try for intuition on
countmore
.First,
~0UL/255*(127-n)
is a clever way of copying the value127-n
to all bytes in the word in parallel. Why does it work?~0
is 255 in all bytes. Consequently,~0/255
is1
in all bytes. Multiplying by(127-n)
does the "copying" mentioned at the outset.The term
~0UL/255*127
is just a special case of the above wheren
is zero. It copies 127 into all bytes. That's0x7f7f7f7f
if words are 4 bytes. "Anding" withx
zeros out the high order bit in each byte.That's the first term
(x)&~0UL/255*127)
. The result is the same asx
except the high bit in each byte is zeroed.The second term
~0UL/255*(127-(n))
is as above:127-n
copied to each byte.For any given byte
x[i]
, adding the two terms gives us127-n+x[i]
ifx[i]<=127
. This quantity will have the high order bit set wheneverx[i]>n
. It's easiest to see this as adding two 7-bit unsigned numbers. The result "overflows" into the 8th bit because the result is 128 or more.So it looks like the algorithm is going to use the 8th bit of each byte as a boolean indicating
x[i]>n
.So what about the other case,
x[i]>127
? Here we know the byte is more thann
because the algorithm stipulatesn<=127
. The 8th bit ought to be always 1. Happily, the sum's 8th bit doesn't matter because the next step "or"s the result withx
. Sincex[i]
has the 8th bit set to 1 if and only if it's 128 or larger, this operation "forces" the 8th bit to 1 just when the sum might provide a bad value.To summarize so far, the "or" result has the 8th bit set to 1 in its i'th byte if and only if
x[i]>n
. Nice.The next operation
&~0UL/255*128
sets everything to zero except all those 8th bits of interest. It's "anding" with 0x80808080...Now the task is to find the number of these bits set to 1. For this,
countmore
uses some basic number theory. First it shifts right 7 bits so the bits of interest are b0, b8, b16... The value of this word isA beautiful fact is that 1 == 2^8 == 2^16 == ... mod 255. In other words, each 1 bit is 1 mod 255. It follows that finding mod 255 of the shifted result is the same as summing b0+b8+b16+...
Yikes. We're done.