SSE optimization of Gaussian blur

Question

SSE optimization of Gaussian blur

1.1k views Asked by Smarty77 At 18 December 2013 at 18:51

I'm working on a school project , I have to optimize part of code in SSE, but I'm stuck on one part for few days now.

I dont see any smart way of using vector SSE instructions(inline assembler / instric f) in this code(its a part of guassian blur algorithm). I would be glad if somebody could give me just a small hint

for (int x = x_start; x < x_end; ++x)     // vertical blur...
    {
        float sum = image[x + (y_start - radius - 1)*image_w];
        float dif = -sum;

        for (int y = y_start - 2*radius - 1; y < y_end; ++y)
        {                                                   // inner vertical Radius loop           
            float p = (float)image[x + (y + radius)*image_w];   // next pixel
            buffer[y + radius] = p;                         // buffer pixel
            sum += dif + fRadius*p;
            dif += p;                                       // accumulate pixel blur

            if (y >= y_start)
            {
                float s = 0, w = 0;                         // border blur correction
                sum -= buffer[y - radius - 1]*fRadius;      // addition for fraction blur
                dif += buffer[y - radius] - 2*buffer[y];    // sum up differences: +1, -2, +1

                // cut off accumulated blur area of pixel beyond the border
                // assume: added pixel values beyond border = value at border
                p = (float)(radius - y);                   // top part to cut off
                if (p > 0)
                {
                    p = p*(p-1)/2 + fRadius*p;
                    s += buffer[0]*p;
                    w += p;
                }
                p = (float)(y + radius - image_h + 1);               // bottom part to cut off
                if (p > 0)
                {
                    p = p*(p-1)/2 + fRadius*p;
                    s += buffer[image_h - 1]*p;
                    w += p;
                }
                new_image[x + y*image_w] = (unsigned char)((sum - s)/(weight - w)); // set blurred pixel
            }
            else if (y + radius >= y_start)
            {
                dif -= 2*buffer[y];
            }
        } // for y
    } // for x

Original Q&A

There are 1 answers

**klm123** · Accepted Answer · 2013-12-18T19:08:15+00:00

One more feature you can use is logical operations and masks:

for example instead of:

  // process only 1
if (p > 0)
    p = p*(p-1)/2 + fRadius*p;

you can write

  // processes 4 floats
const __m128 &mask = _mm_cmplt_ps(p,0);
const __m128 &notMask = _mm_cmplt_ps(0,p);
const __m128 &p_tmp = ( p*(p-1)/2 + fRadius*p );
p = _mm_add_ps(_mm_and_ps(p_tmp, mask), _mm_and_ps(p, notMask)); // = p_tmp & mask + p & !mask

Also I can recommend you to use a special libraries, which overloads instructions. For example: http://code.compeng.uni-frankfurt.de/projects/vc
dif variable makes iterations of inner loop dependent. You should try to parallelize the outer loop. But with out instructions overloading the code will become unmanageable then.
Also consider rethinking the whole algorithm. Current one doesn't look paralell. May be you can neglect precision, or increase scalar time a bit?

TechQA.

SSE optimization of Gaussian blur

There are 1 answers

Related Questions in C++

Related Questions in OPTIMIZATION

Related Questions in SSE

Related Questions in SIMD

Related Questions in GAUSSIANBLUR

Popular Questions

Trending Questions