I have a char array char input[8] = "abcdabcd"
, and I want to bitwise flip it by diagonal, which mean
input
:
input[0] == 'a': 0 1 1 0 0 0 0 1
input[1] == 'b': 0 1 1 0 0 0 1 0
input[2] == 'c': 0 1 1 0 0 0 1 1
input[3] == 'd': 0 1 1 0 0 1 0 0
input[4] == 'a': 0 1 1 0 0 0 0 1
input[5] == 'b': 0 1 1 0 0 0 1 0
input[6] == 'c': 0 1 1 0 0 0 1 1
input[7] == 'd': 0 1 1 0 0 1 0 0
output
:
a b c d a b c d
output[0] == 0 : 0 0 0 0 0 0 0 0
output[1] == 255 : 1 1 1 1 1 1 1 1
output[2] == 255 : 1 1 1 1 1 1 1 1
output[3] == 0 : 0 0 0 0 0 0 0 0
output[4] == 0 : 0 0 0 0 0 0 0 0
output[5] == 17 : 0 0 0 1 0 0 0 1
output[6] == 102 : 0 1 1 0 0 1 1 0
output[7] == 170 : 1 0 1 0 1 0 1 0
It is obvious that we can use two loop combined with bitwise or operation to set the target bit one by one, however, this mean we need at least 64 * n
operations, which I think is not effectively.
Since the input and output are just about reading memory in different directions (by row or by column), are there any more effectively way?
Besides, I think it is quite acceptable and make sense to do this operation based on special memory layout, or change the number or characters in array.
Thanks!
Here is my code based on the trick from Hacker's Delight. Although it is CPU code, it can be easily transformed into parallel CUDA code.
This code itself is for transposing bitmap of arbitrary sizes. What you really need is the code for transposing an
uint64_t
x to anotheruint64_t
y.