Opt out of `char` exemption to strict aliasing rules

144 views Asked by At

If I have a simple piece of code using uint32_t then it can be optimised better than the same code with uint8_t. As far as I know this is because char has exemptions to the strict aliasing rules. Consider:

using T = uint32_t;

T *a;
T *b;
T *c;

void mult(int num)
{
    for (int count = 0; count < num; count++)
    {
        a[count] = b[count] * c[count];
    }
}

https://godbolt.org/z/sW1xnTrhc

This has an inner loop in -01 of:

.LBB0_2:                                # =>This Inner Loop Header: Depth=1
        mov     r8d, dword ptr [rcx + 4*rdi]
        imul    r8d, dword ptr [rax + 4*rdi]
        mov     dword ptr [rdx + 4*rdi], r8d
        inc     rdi
        cmp     rsi, rdi
        jne     .LBB0_2

Note in this case it simply load one value, does a multiply, stores the result, and loops. This is good. However if I used uint8_t (https://godbolt.org/z/doM4o6ena) I get this inner loop from clang:

.LBB0_2:                                # =>This Inner Loop Header: Depth=1
        mov     rsi, qword ptr [rip + b] # see here
        mov     rax, qword ptr [rip + c] # see here
        movzx   eax, byte ptr [rax + rdx]
        mul     byte ptr [rsi + rdx]
        mov     rsi, qword ptr [rip + a] # see here
        mov     byte ptr [rsi + rdx], al
        inc     rdx
        cmp     rcx, rdx
        jne     .LBB0_2

Note that this inner loop loads the values of a, b and c every single iteration. This is as I understand because the storage of the pointer for a, b and c may alias with what is pointed to, and so the loop must run each iteration separately, and reload the values. This gets even worse with higher optimisation levels. Using uint16_t and/or uint32_t with -O3 the compiler does all sorts of SIMD/XMM wizardry but the uint8_t/char loop remains stubbornly simple and unoptimised.

Note I am not asking for ways round this using restrict, or avoiding global variables. Nor am I asking for ways to optimise this specific example.

What I am asking is if there is a simple 8 bit arithmetic type which I can use which can't fall into this trap.

0

There are 0 answers