Conditionally flip sign of float with SSE and/or AVX

270 views Asked by At

With bitchar[] is an array of 0 and 1, I want to flip the sign of in[i] if bitchar[i] = 1 (scrambling):

float *in = get_in();
float *out = get_out();
char *bitchar = get_bitseq();
for (int i = 0; i < size; i++) {
   out[i] = in[i] * (1 - 2 * bitchar[i]);
}

My AVX code:

__m256 xmm1 = _mm256_set_ps(1);
__m256 xmm2 = _mm256_set_ps(2);
for (int i = 0; i < size; i+=8) {
   __m256 xmmb = _mm256_setr_ps (bitchar[i+7], bitchar[i+6], bitchar[i+5], bitchar[i+4], bitchar[i+3], bitchar[i+2], bitchar[i+1], bitchar[i]);
   __m256 xmmpm1 = _mm256_sub_ps(xmm1, _mm256_mul_ps(xmm2,xmmb));
   __m256 xmmout = _mm256_mul_ps(_mm256_load_ps(&in[i]),xmmpm1);
   _mm256_store_ps(&out[i],xmmout);
}

However, the AVX code is not much faster, sometimes even slower. Maybe my avx is not optimal. Could anyone help me?

1

There are 1 answers

2
Anna Noie On

Thank everyone for the hints. I came up with this solution using SSE4.1. Any better solution will be appriciated.

    const int size4 = (size / 4) * 4;
    for (int i = 0; i < size4; i += 4) {
        __m128i xmm1 = _mm_cvtepu8_epi32((__m128i) _mm_loadu_ps((float *) &bitchar[i]));
        __m128 xmm2 = (__m128) _mm_slli_epi32(xmm1, 31);
        __m128 xmm3 = _mm_xor_ps(xmm2, _mm_loadu_ps(&in[i]));
        _mm_storeu_ps(&out[i], xmm3);
    }
    for (int i = size4; i < size; i++) {
        out[i] = in[i] * (1 - 2 * bitchar[i]);
    }