nppiFilter breaks output image

658 views Asked by At

I wrote an example of BoxFilter using NPP, but the output image looks broken. This is my code:

#include <stdio.h>
#include <string.h>

#include <ImagesCPU.h>
#include <ImagesNPP.h>
#include <Exceptions.h>

#include <npp.h>
#include "utils.h"


void boxfilter1_transform( Npp8u *data, int width, int height ){
    size_t size = width * height * 4;

    // declare a host image object for an 8-bit RGBA image
    npp::ImageCPU_8u_C4 oHostSrc(width, height);

    Npp8u *nDstData = oHostSrc.data();
    memcpy(nDstData, data, size * sizeof(Npp8u));

    // declare a device image and copy construct from the host image,
    // i.e. upload host to device
    npp::ImageNPP_8u_C4 oDeviceSrc(oHostSrc);

    // create struct with box-filter mask size
    NppiSize oMaskSize = {3, 3};

    // Allocate memory for pKernel
    Npp32s hostKernel[9] = {1, 1, 1, 1, 1, 1, 1, 1, 1};
    Npp32s *pKernel;

    checkCudaErrors( cudaMalloc((void**)&pKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s)) );
    checkCudaErrors( cudaMemcpy(pKernel, hostKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s),
                                cudaMemcpyHostToDevice) );

    Npp32s nDivisor = 9;

    // create struct with ROI size given the current mask
    NppiSize oSizeROI = {oDeviceSrc.width() - oMaskSize.width + 1, oDeviceSrc.height() - oMaskSize.height + 1};
    // allocate device image of appropriatedly reduced size
    npp::ImageNPP_8u_C4 oDeviceDst(oSizeROI.width, oSizeROI.height);
    // set anchor point inside the mask
    NppiPoint oAnchor = {2, 2};

    // run box filter
    NppStatus eStatusNPP;
    eStatusNPP = nppiFilter_8u_C4R(oDeviceSrc.data(), oDeviceSrc.pitch(),
                                   oDeviceDst.data(), oDeviceDst.pitch(),
                                   oSizeROI, pKernel, oMaskSize, oAnchor, nDivisor);
    //printf("NppiFilter error status %d\n", eStatusNPP);
    NPP_DEBUG_ASSERT(NPP_NO_ERROR == eStatusNPP);

    // declare a host image for the result
    npp::ImageCPU_8u_C4 oHostDst(oDeviceDst.size());
    // and copy the device result data into it
    oDeviceDst.copyTo(oHostDst.data(), oHostDst.pitch());
    memcpy(data, oHostDst.data(), size * sizeof(Npp8u));

    return;
}

Most part of code was copied from example boxFilterNPP.cpp. And the output image: http://img153.imageshack.us/img153/7716/o8z.png

Why it can be?

1

There are 1 answers

1
Robert Crovella On BEST ANSWER

You have a striding problem. Change this line:

npp::ImageCPU_8u_C4 oHostDst(oDeviceDst.size());

To this:

npp::ImageCPU_8u_C4 oHostDst(oDeviceSrc.size());

What is happening?

Let's assume your input image is 600x450.

  • oHostSrc is 600 x 450, and the pitch is 600x4 = 2400.
  • the memcpy from data to oHostSrc is ok because they have the same width and pitch.
  • oDeviceSrc picks up the size from oHostSrcc (600x450)
  • oDeviceDst is slightly smaller than oDeviceSrc, because it only picks up the size of the ROI, so it is something like 596x446.
  • Your code was creating oHostDst to be the same size as oDeviceDst, so about 596x446.
  • The .copyTo operation copies the oDeviceDst (pitched) 596x446 image to (unpitched) oHostDst, also 596x446.
  • The final memcpy breaks the image, because it is copying a 596x446 oHostDst image to a 600x450 data region.

The solution is to create oHostDst at 600x450 and let the .copyTo operation handle the difference in line sizes and pitches.

The original code didn't have this problem because there were no unpitched copies anywhere in that code (e.g. no use of raw memcpy). As long as you handle the source and destination pitch and width explicitly at every copy step, it does not matter whether you create the final image as 600x450 or 596x446. But your final memcpy operation was not handling pitch and width explicitly, instead it implicitly assumed both source and destination were of the same size, and this was not the case.