I have to implement matrix-vector multiplication using sse/sse2. Vector and matrix are large. Matrix is double, vector is float.
The point is that all calculations I have to do on floats - when I get data from matrix I promote it to float, do the calculations and I get float vector (later after some additional calculations on floats I have to add some float values (float matrix) to double values (double matrix).
My question is how I can do it using SSE/SSE2 - the problem is with doubles - I have pointer to double* and I have to somehow convert 4 doubles into 4 floats to fit in __mm128... Are there any intructions to do that?
You need to call
__m128 _mm_cvtpd_ps (__m128d a)
(CVTDP2PS
) twice to get two single precision float vectors, each containing two of your original double precision values, then merge these two float vectors into a single vector, using e.g.__m128 _mm_shuffle_ps(__m128 a, __m128 b, unsigned int imm8)
(SHUFPS
).