How to bitwise flip an char array by diagonal effectively in C/C++ or Cuda?

286 views Asked by At

I have a char array char input[8] = "abcdabcd", and I want to bitwise flip it by diagonal, which mean
input:

input[0] == 'a': 0 1 1 0 0 0 0 1
input[1] == 'b': 0 1 1 0 0 0 1 0
input[2] == 'c': 0 1 1 0 0 0 1 1
input[3] == 'd': 0 1 1 0 0 1 0 0
input[4] == 'a': 0 1 1 0 0 0 0 1
input[5] == 'b': 0 1 1 0 0 0 1 0
input[6] == 'c': 0 1 1 0 0 0 1 1
input[7] == 'd': 0 1 1 0 0 1 0 0

output:

                   a b c d a b c d
output[0] ==   0 : 0 0 0 0 0 0 0 0
output[1] == 255 : 1 1 1 1 1 1 1 1
output[2] == 255 : 1 1 1 1 1 1 1 1
output[3] ==   0 : 0 0 0 0 0 0 0 0
output[4] ==   0 : 0 0 0 0 0 0 0 0
output[5] ==  17 : 0 0 0 1 0 0 0 1
output[6] == 102 : 0 1 1 0 0 1 1 0
output[7] == 170 : 1 0 1 0 1 0 1 0

It is obvious that we can use two loop combined with bitwise or operation to set the target bit one by one, however, this mean we need at least 64 * n operations, which I think is not effectively.
Since the input and output are just about reading memory in different directions (by row or by column), are there any more effectively way?
Besides, I think it is quite acceptable and make sense to do this operation based on special memory layout, or change the number or characters in array.
Thanks!

1

There are 1 answers

5
yhf8377 On BEST ANSWER

Here is my code based on the trick from Hacker's Delight. Although it is CPU code, it can be easily transformed into parallel CUDA code.

This code itself is for transposing bitmap of arbitrary sizes. What you really need is the code for transposing an uint64_t x to another uint64_t y.

using BitBlock = uint8_t;
using BitBlocks = std::vector<BitBlock>;

void FPTransMap::transpose_bitmap( BitBlocks& bitmap, size_type blocks_per_row )
{
    assert( bitmap.size() % blocks_per_row == 0 );
    assert( ( bitmap.size() / blocks_per_row ) % 8 == 0 );

    BitBlocks transposed( bitmap.size() );
    size_type nrow = bitmap.size() / blocks_per_row, row_blocks = nrow / 8;
    for ( index_type i = 0; i < row_blocks; ++i ) {
        for ( index_type j = 0; j < blocks_per_row; ++j ) {
            uint64_t x = ( uint64_t( bitmap[   i * 8       * blocks_per_row + j ] ) << 56 ) |
                         ( uint64_t( bitmap[ ( i * 8 + 1 ) * blocks_per_row + j ] ) << 48 ) |
                         ( uint64_t( bitmap[ ( i * 8 + 2 ) * blocks_per_row + j ] ) << 40 ) |
                         ( uint64_t( bitmap[ ( i * 8 + 3 ) * blocks_per_row + j ] ) << 32 ) |
                         ( uint64_t( bitmap[ ( i * 8 + 4 ) * blocks_per_row + j ] ) << 24 ) |
                         ( uint64_t( bitmap[ ( i * 8 + 5 ) * blocks_per_row + j ] ) << 16 ) |
                         ( uint64_t( bitmap[ ( i * 8 + 6 ) * blocks_per_row + j ] ) <<  8 ) |
                         ( uint64_t( bitmap[ ( i * 8 + 7 ) * blocks_per_row + j ] ) );
            uint64_t y = (x & 0x8040201008040201LL) |
                        ((x & 0x0080402010080402LL) <<  7) |
                        ((x & 0x0000804020100804LL) << 14) |
                        ((x & 0x0000008040201008LL) << 21) |
                        ((x & 0x0000000080402010LL) << 28) |
                        ((x & 0x0000000000804020LL) << 35) |
                        ((x & 0x0000000000008040LL) << 42) |
                        ((x & 0x0000000000000080LL) << 49) |
                        ((x >>  7) & 0x0080402010080402LL) |
                        ((x >> 14) & 0x0000804020100804LL) |
                        ((x >> 21) & 0x0000008040201008LL) |
                        ((x >> 28) & 0x0000000080402010LL) |
                        ((x >> 35) & 0x0000000000804020LL) |
                        ((x >> 42) & 0x0000000000008040LL) |
                        ((x >> 49) & 0x0000000000000080LL);
            transposed[ ( j * 8 ) * row_blocks + i ]     = uint8_t( ( y >> 56 ) & 0xFF );
            transposed[ ( j * 8 + 1 ) * row_blocks + i ] = uint8_t( ( y >> 48 ) & 0xFF );
            transposed[ ( j * 8 + 2 ) * row_blocks + i ] = uint8_t( ( y >> 40 ) & 0xFF );
            transposed[ ( j * 8 + 3 ) * row_blocks + i ] = uint8_t( ( y >> 32 ) & 0xFF );
            transposed[ ( j * 8 + 4 ) * row_blocks + i ] = uint8_t( ( y >> 24 ) & 0xFF );
            transposed[ ( j * 8 + 5 ) * row_blocks + i ] = uint8_t( ( y >> 16 ) & 0xFF );
            transposed[ ( j * 8 + 6 ) * row_blocks + i ] = uint8_t( ( y >> 8 ) & 0xFF );
            transposed[ ( j * 8 + 7 ) * row_blocks + i ] = uint8_t( y & 0xFF );
        }
    }
    std::swap( bitmap, transposed );
}