SSE for 2D arrays

252 views Asked by At

I want to change the following code using SSE3 instructions:

 for (i=0; i<=imax+1; i++) {
        /* The vertical velocity approaches 0 at the north and south
         * boundaries, but fluid flows freely in the horizontal direction */
        v[i][jmax] = 0.0;
        u[i][jmax+1] = u[i][jmax];

        v[i][0] = 0.0;
        u[i][0] = u[i][1];
    }

u and v are 2D arrays of type float. What I have so far is this but the program does not run correctly.

    int loop2 = ((imax+1) / loopFactor) * loopFactor;
    for(i=0; i<loop2; i+=loopFactor) {
        
        __m128 zeroVec = _mm_set1_ps(0.0f);
        _mm_storeu_ps(&v[i][jmax], zeroVec);
        __m128 umaxVec = _mm_loadu_ps(&u[i][jmax]);
        _mm_storeu_ps(&u[i][jmax+1], umaxVec);

        __m128 zVec = _mm_set1_ps(0.0f);
        _mm_storeu_ps(&v[i][0], zVec);
        __m128 uVec = _mm_loadu_ps(&u[i][1]);
        _mm_storeu_ps(&u[i][0], uVec);
    }
    for (; i<=imax+1; i++){
        v[i][jmax] = 0.0;
        u[i][jmax+1] = u[i][jmax];

        v[i][0] = 0.0;
        u[i][0] = u[i][1];
    }

I suspect that this is because _mm_loadu_ps stores values for u[i][1], u[i][2], u[i][3] and u[i][4] but I want to store the values u[i][1], u[i+1][1], u[i+2][1], u[i+3][1] and u[i+4][1]. How can I do that? Loopfactor has a value of 4. Any help is really appreciated.

0

There are 0 answers