SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

Question

SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

406 views Asked by Koz Ross At 10 September 2024 at 21:33

Bit Twiddling Hacks contains the following macros, which count the number of bytes in a word x that are less than, or greater than, n:

#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)

#define countmore(x,n) \
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)

However, it doesn't explain why they work. What's the logic behind these macros?

Original Q&A

There are 2 answers

HamidReza On 08 July 2021 at 08:32

Let's analyse countless macro. We can simplify this macro as following code:

#define A(n) (0x0101010101010101UL * (0x7F+n))
#define B(x) (x & 0x7F7F7F7F7F7F7F7FUL)
#define C(x,n)     (A(n) - B(x))
#define countless(x,n)  ((  C(x,n)  &  ~x  & 0x8080808080808080UL) / 0x80 % 0xFF )

A(n) will be:

A(0) = 0x7F7F7F7F7F7F7F7F
A(1) = 0x8080808080808080
A(2) = 0x8181818181818181
A(3) = 0x8282828282828282
....

And for B(x), each byte of x will mask with 0x7F. If we suppose x = 0xb0b1b2b3b4b5b6b7 and n = 0, then C(x,n) will equals to 0x(0x7F-b0)(0x7F-b1)(0x7F-b2)...

For example, We suppose x = 0x1234567811335577 and n = 0x50. So:

A(0x50) = 0xCFCFCFCFCFCFCFCF
B(0x1234567811335577) = 0x1234567811335577
C(0x1234567811335577, 0x50) = 0xBD9B7957BE9C7A58
~(0x1234567811335577) = 0xEDCBA987EECCAA88
0xEDCBA987EECCAA88  & 0x8080808080808080UL = 0x8080808080808080
C(0x1234567811335577, 0x50) & 0x8080808080808080 = 0x8080000080800000
(0x8080000080800000 / 0x80) % 0xFF =  4 //Count bytes that equal to 0x80 value.

**Gene** · Accepted Answer · 2021-07-10 05:46:27

Let's try for intuition on countmore.

First, ~0UL/255*(127-n) is a clever way of copying the value 127-n to all bytes in the word in parallel. Why does it work? ~0 is 255 in all bytes. Consequently, ~0/255 is 1 in all bytes. Multiplying by (127-n) does the "copying" mentioned at the outset.

The term ~0UL/255*127 is just a special case of the above where n is zero. It copies 127 into all bytes. That's 0x7f7f7f7f if words are 4 bytes. "Anding" with x zeros out the high order bit in each byte.

That's the first term (x)&~0UL/255*127). The result is the same as x except the high bit in each byte is zeroed.

The second term ~0UL/255*(127-(n)) is as above: 127-n copied to each byte.

For any given byte x[i], adding the two terms gives us 127-n+x[i] if x[i]<=127. This quantity will have the high order bit set whenever x[i]>n. It's easiest to see this as adding two 7-bit unsigned numbers. The result "overflows" into the 8th bit because the result is 128 or more.

So it looks like the algorithm is going to use the 8th bit of each byte as a boolean indicating x[i]>n.

So what about the other case, x[i]>127? Here we know the byte is more than n because the algorithm stipulates n<=127. The 8th bit ought to be always 1. Happily, the sum's 8th bit doesn't matter because the next step "or"s the result with x. Since x[i] has the 8th bit set to 1 if and only if it's 128 or larger, this operation "forces" the 8th bit to 1 just when the sum might provide a bad value.

To summarize so far, the "or" result has the 8th bit set to 1 in its i'th byte if and only if x[i]>n. Nice.

The next operation &~0UL/255*128 sets everything to zero except all those 8th bits of interest. It's "anding" with 0x80808080...

Now the task is to find the number of these bits set to 1. For this, countmore uses some basic number theory. First it shifts right 7 bits so the bits of interest are b0, b8, b16... The value of this word is

b0 + b8*2^8 + b16*2^16 + ...

A beautiful fact is that 1 == 2^8 == 2^16 == ... mod 255. In other words, each 1 bit is 1 mod 255. It follows that finding mod 255 of the shifted result is the same as summing b0+b8+b16+...

Yikes. We're done.

TechQA.

SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

There are 2 answers

Related Questions in C

Related Questions in BIT-MANIPULATION

Related Questions in SWAR

Popular Questions

Popular Tags

Trending Questions