I've made some progress in understanding bit reversal algorithms using non trivial (binary partition) magic constants based on D.Knuth's ideas but i fail to understand how these non continuous masks work, which seems to have bits spread around in various distances. It's always a 2-step process which swaps lsb and msb bits, leaving the middle unchanged but it does this in multiple parts of a register at the same time. LUT, x86 asm based bit reversals fall under the trivial category so consider them irrelevant.
Classic Binary (Recursive) Partition Bit Reversal Algorithm 64bit (trivial)
x = ((x >> 1) & c1) | ( (x & c1) << 1); //0x5555555555555555ULL
x = ((x >> 2) & c2) | ( (x & c2) << 2); //0x3333333333333333ULL
x = ((x >> 4) & c3) | ( (x & c3) << 4); //0x0F0F0F0F0F0F0F0FULL
x = ((x >> 8) & c4) | ( (x & c4) << 8); //0x00FF00FF00FF00FFULL
x = ((x >>16) & c5) | ( (x & c5) <<16); //0x0000FFFF0000FFFFULL
x = ((x >>32) ) | ( (x ) <<32); //0x00000000FFFFFFFFULL (implicit)
32 bit code - hackersdelight.org/revisions.pdf(2016-12-21) p.36
//temp
N4 t;
//17msb <- 17lsb, 15msb -> 15lsb
x = (x << 15) | (x >> 17);
//??
t = (x ^ (x >> 10)) & 0x003F801F;
x = x ^ t ^ (t << 10);
//??
t = (x ^ (x >> 4)) & 0x0E038421;
x = x ^ t ^ (t << 4);
//??
t = (x ^ (x >> 2)) & 0x22488842;
x = x ^ t ^ (t << 2);
return x;
64 bit code - AOCP - Vol.4 - 1A - 7.1.3 - p.13 - (70)
//classic adjacent bit swap
x = ((x >> 1) & c1) | ( (x & c1) << 1);
//??
t = (x ^ (x >> 4)) & 0x0300C0303030C303ULL;
x = x ^ t ^ (t << 4);
//??
t = (x ^ (x >> 8)) & 0x00C0300C03F0003FULL;
x = x ^ t ^ (t << 8);
//??
t = (x ^ (x >>20)) & 0x00000FFC00003FFFULL;
x = x ^ t ^ (t << 20);
x = (x >> 34) | (x << 30);
Explanation: AOCP - Vol.4 - 1A - 7.1.3 - p.13
Paraphrasing Knuth: there is an operation that given a register swaps
some msb-bits with some lsb-bits leaving some in the middle unchanged.
Specifically we can swap bits that are k-positions apart, with k>=1
ie. bits are not overlapping. Operation is given below.
Let 64bit register (x). We want to swap 25-msb and 25-lsb of x, with 14bits (25 + 14 + 25): 64
in the middle left as is. We set k = 64-25 = 39, m = (pow(2,25)-1) = 0x1FFFFFF
(t:temp)(x:input)(m:mask)
t = (x ^ (x >> k)) & m; // ((x-25-msb) xor (x-25-lsb)) -> t-25-lsb
t = t ^ (t << k); // t-25-lsb -> get copied to -> t-25-msb, so t-25-msb/lsb holds: x-msb-25 ^ x-lsb-25
x = x ^ t; // on x-25-msb, msb gets calceled out, so x-25-msb <- x-25-lsb, t-14-middle is zero
// so x-14-midddle xor 0x0 remains as is, x-25-lsb <- x-25-msb because it cancels out with xor op.
//above 3line operation in 2 lines
t = (x ^ (x >> k)) & m;
x = (t ^ (t << k)) ^ x; //line 2
//common ways line 2 is displayed
x = (t + (t << k)) ^ x; //equiv. line 2
x = (t | (t << k)) ^ x; //equiv. line 2
x = (t ^ (t << k)) ^ x; //equiv. line 2
x = x ^ t ^ (t << k) ; //equiv. line 2
operation with letters:
[a,b,c] -> [ 0,0,a] ^ [a,b,c] -> [a,b,c^a] & mask -> [0,0,c^a] -> [0,0,c^a] ^ [c^a,0,0] -> [c^a,0,c^a] ->
-> [a,b,c] ^ [c^a,0,c^a] -> [c,b,a]
Another 64bit reverse (Hacker's Delight)
//swap halves
x = (x << 32) | (x >> 32);
//its the 32bit knuth bit reverse: [17bit m<-l],[15bit m->l] step: applied on both 32bit up/down half of 64bit register.
x = (x & 0x0001FFFF0001FFFFLL) << 15 | (x & 0xFFFE0000FFFE0000LL) >> 17;
t = (x ^ (x >> 10)) & 0x003F801F003F801FULL;
x = (t | (t << 10)) ^ x;
t = (x ^ (x >> 4)) & 0x0E0384210E038421ULL;
x = (t | (t << 4)) ^ x;
t = (x ^ (x >> 2)) & 0x2248884222488842ULL;
x = (t | (t << 2)) ^ x;
Hacker's Delight - Figure 7-4 - p.37
x = (x << 31) | (x >> 33);
t = (x ^ (x >> 20)) & 0x00000FFF800007FFULL;
x = (t |(t << 20)) ^ x;
t = (x ^ (x >> 8)) & 0x00F8000F80700807ULL;
x = (t |(t << 8)) ^ x;
t = (x ^ (x >> 4)) & 0x0808708080807008ULL;
x = (t |(t << 4)) ^ x;
t = (x ^ (x >> 2)) & 0x1111111111111111ULL;
x = (t |(t << 2)) ^ x;