Simple speed up of C++ OpenMP kernel

Question

Simple speed up of C++ OpenMP kernel

220 views Asked by Pieter Bosma At 08 June 2015 at 12:26

I have never worked with OpenMP or optimization of C++, so all help is welcome. I'm probably doing some very stupid things that slow down the process drastically. It doesn't need to be the fastest, but I think some easy tricks will significantly speed it up. Anyone? Thanks a lot!

This function calculates the standard deviation of a patch, given a kernel size and greyscale OpenCV image. The middle pixel of the patch is kept if it is below the given threshold, else it is rejected. This is done for each pixel except the border.

#include "stdafx.h"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/photo/photo.hpp"
#include <stdlib.h>
#include <stdio.h>
#include "utils.h"
#include <windows.h>
#include <string.h>
#include <math.h>
#include <numeric>

using namespace cv;
using namespace std;
Mat low_pass_filter(Mat img, int threshold, int kernelSize)
{
    unsigned char *input = (unsigned char*)(img.data);
    Mat output = Mat::zeros(img.size(), CV_8UC1);
    unsigned char *output_ptr = (unsigned char*)(output.data);

    #pragma omp parallel for
    for (int i = (kernelSize - 1) / 2; i < img.rows - (kernelSize - 1) / 2; i++){
        for (int j = (kernelSize - 1) / 2; j < img.cols - (kernelSize - 1) / 2; j++){
            double sum, m, accum, stdev;
            vector<double> v;
            // Kernel Patch
            for (int kx = i - (kernelSize - 1) / 2; kx <= i + (kernelSize - 1) / 2; kx++){
                for (int ky = j - (kernelSize - 1) / 2; ky <= j + (kernelSize - 1) / 2; ky++){
                    v.push_back((double)input[img.step * kx + ky]);//.at<uchar>(kx, ky));
                }
            }
            sum = std::accumulate(std::begin(v), std::end(v), 0.0);
            m = sum / v.size();

            accum = 0.0;
            std::for_each(std::begin(v), std::end(v), [&](const double d) {
                accum += (d - m) * (d - m);
            });

            stdev = sqrt(accum / (v.size() - 1));
            if (stdev < threshold){
                output_ptr[img.step * i + j] = input[img.step * i + j];
            }
        }
    }
    return output;
}

Original Q&A

There are 1 answers

**Toby Speight** · Accepted Answer · 2015-06-08T13:02:49+00:00

Vector v is not required. Instead of adding items to it, maintain accumulators of d and d*d, and then use variance = E(v²) / E(v)² so that your inner code becomes:

        double sum = 0;
        double sum2 = 0;
        int n = kernelSize * kernelSize;
        // Kernel Patch
        for (int kx = ...) {
            for (int ky = ...) {
                sum += d;
                sum2 += d*d;
            }
        }

        double mean = sum/n;
        double stddev = sqrt(sum2/n - mean*mean);
        if (stddev < threshold) {
            ...;
        }

After that, consider that the sum of elements centred around (x+1,y) can be found from the result for (x,y) simply by subtracting all the elements in the previous left-hand column, and adding all the elements in the new right-hand column. An analogous operation works vertically.

Also, check your compiler options - are you auto-vectorizing loops, and using SIMD instructions (if available)?

TechQA.

Simple speed up of C++ OpenMP kernel

There are 1 answers

Related Questions in C++

Related Questions in OPENCV

Related Questions in OPENMP

Popular Questions

Popular Tags

Trending Questions