With bitchar[]
is an array of 0 and 1, I want to flip the sign of in[i]
if bitchar[i] = 1
(scrambling):
float *in = get_in();
float *out = get_out();
char *bitchar = get_bitseq();
for (int i = 0; i < size; i++) {
out[i] = in[i] * (1 - 2 * bitchar[i]);
}
My AVX code:
__m256 xmm1 = _mm256_set_ps(1);
__m256 xmm2 = _mm256_set_ps(2);
for (int i = 0; i < size; i+=8) {
__m256 xmmb = _mm256_setr_ps (bitchar[i+7], bitchar[i+6], bitchar[i+5], bitchar[i+4], bitchar[i+3], bitchar[i+2], bitchar[i+1], bitchar[i]);
__m256 xmmpm1 = _mm256_sub_ps(xmm1, _mm256_mul_ps(xmm2,xmmb));
__m256 xmmout = _mm256_mul_ps(_mm256_load_ps(&in[i]),xmmpm1);
_mm256_store_ps(&out[i],xmmout);
}
However, the AVX code is not much faster, sometimes even slower. Maybe my avx is not optimal. Could anyone help me?
Thank everyone for the hints. I came up with this solution using SSE4.1. Any better solution will be appriciated.