Effective way to extract from SSE vector on AMD processors

500 views Asked by At

I'm looking for an effective way to extract lower 64 bit integer from __m128i on AMD Piledriver. Something like this:

static inline int64_t extractlo_64(__m128i x)
{
    int64_t result;
    // extract into result
    return result;
}

Instruction tables say that common approach - using _mm_extract_epi64() - is ineffective on this processor. It generates PEXTRQ instruction which has a latency of 10 cycles (compared to 2-3 cycles in Intel processors). Is there any better way to do this?

2

There are 2 answers

0
Marat Dukhan On BEST ANSWER

On x86-64 you can use _mm_cvtsi128_si64, which translates to a single MOVQ r64, xmm instruction

6
Paul R On

One possibility might be to use MOVDQ2Q, which has a latency of 2 instructions on Piledriver:

static inline int64_t extractlo_64(const __m128i v)
{
    return _m_to_int64(_mm_movepi64_pi64(v)); // MOVDQ2Q + MOVQ
}