I am looking for a way to permutate the 1 byte and/or 2 byte values in an __m256i
register using AVX2 instructions. The solution needs to be able to able move values across 128-bit lanes.
I know that with AVX512 I could use _mm256_permutexvar_epi8
and _mm256_permutexvar_epi16
but I cant seem to find any generic solution with AVX2 for when the values need to go across lanes (if the values can stay within lane _mm256_shuffle_epi8
or _mm256_shuflehi_epi16(_mm256_shufflelo_epi16)
works).
The permutation indices will be known at compile time.