Sum of the four 32bits elements of a _m128 vector

Question

Sum of the four 32bits elements of a _m128 vector

1.4k views Asked by Merkil At 15 April 2012 at 16:05

I'm using intrinsics to optimize a program of mine. But now I would like to sum the four elements that are in a __m128 vector in order to compare the result to a floating point value. For instance, let's say I have this 128 bits vector : {a, b c, d}. How can I compare a+b+c+d to e, where e is of type float ?

Does SSE2 or SSE3 provide a way to do that simply or do you have any code snippet that could help me ? Thanks !

Original Q&A

There are 1 answers

**harold** · Accepted Answer · 2012-04-15T17:12:44+00:00

The best I can up with is this:

; assumes    xmm0 = [0, B, 0, A] or similar
mulps xmm0,xmm0   ; [0, B*B, 0, A*A]
xorps xmm1,xmm1
movhlps xmm1,xmm0 ; [0, 0, 0, B * B]
addps xmm0,xmm1   ; [0, 0, 0, A * A + B * B]

If A and B absolutely have to be in the low quadword then as far as I can tell you need a shuffle, which is slower on pre-Penryn (and on a Penryn the DPPS solution is available).

TechQA.

Sum of the four 32bits elements of a _m128 vector

There are 1 answers

Related Questions in SUM

Related Questions in SIMD

Related Questions in SSE2

Related Questions in SSE3

Popular Questions

Trending Questions