Double-word vector rotates on old Altivec's without 64-bit data type

101 views Asked by At

This is related to Power4 and lack of vector long long. On Power7 and Power8 we can perform:

typedef __vector unsigned long long uint64x2_p;
...

uint64x2_p val = {...};
uint64x2_p res = vec_rl(val, val, bits);

I need to find a workaround for the missing 64-bit vector type and rotate on Power4. I think there are two strategies. First, rotate in C/C++ or; second, use 32-bit vector types. I'm guessing (2) is the faster strategy given the data is in a vector register.

I feel like this problem was solved long ago since there's nothing special about a double-word rotate. Unfortunately search is not returning useful hits: "power4" "doubleword" rotate.

I think I have the basic algorithm that consists of three LOAD's, two SHIFT's, two PERM's and an OR. But I'm not sure if there's a better approach.

How do I perform a 64-bit rotate when working on Power4, which lacks the double-word rotate?


typedef __vector unsigned int uint32x4_p;

template <unsigned int R>
inline uint32x4_p VecRotateLeft64(const uint32x4_p val)
{
    enum {LSHIFT = R%32};
    enum {RSHIFT = 32 - (R%32)};
    enum {PERMUTE = R > 32};

    const uint32x4_p lbits = {LSHIFT,LSHIFT,LSHIFT,LSHIFT};
    uint32x4_p left(vec_sl(val, lbits));

    const uint32x4_p rbits = {RSHIFT,RSHIFT,RSHIFT,RSHIFT};
    uint32x4_p right(vec_sr(val, rbits));

    const uint8x16_p mask = {4,5,6,7, 0,1,2,3, 12,13,14,15, 8,9,10,11};
    right = vec_perm(right, right, mask);
    uint32x4_p result = vec_or(left, right);

    // Permute left and right parts of 64-bit word as needed
    if (PERMUTE)
        result = vec_perm(result, result, mask);

    return result;
}
0

There are 0 answers