I wrote an example of BoxFilter using NPP, but the output image looks broken. This is my code:
#include <stdio.h>
#include <string.h>
#include <ImagesCPU.h>
#include <ImagesNPP.h>
#include <Exceptions.h>
#include <npp.h>
#include "utils.h"
void boxfilter1_transform( Npp8u *data, int width, int height ){
size_t size = width * height * 4;
// declare a host image object for an 8-bit RGBA image
npp::ImageCPU_8u_C4 oHostSrc(width, height);
Npp8u *nDstData = oHostSrc.data();
memcpy(nDstData, data, size * sizeof(Npp8u));
// declare a device image and copy construct from the host image,
// i.e. upload host to device
npp::ImageNPP_8u_C4 oDeviceSrc(oHostSrc);
// create struct with box-filter mask size
NppiSize oMaskSize = {3, 3};
// Allocate memory for pKernel
Npp32s hostKernel[9] = {1, 1, 1, 1, 1, 1, 1, 1, 1};
Npp32s *pKernel;
checkCudaErrors( cudaMalloc((void**)&pKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s)) );
checkCudaErrors( cudaMemcpy(pKernel, hostKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s),
cudaMemcpyHostToDevice) );
Npp32s nDivisor = 9;
// create struct with ROI size given the current mask
NppiSize oSizeROI = {oDeviceSrc.width() - oMaskSize.width + 1, oDeviceSrc.height() - oMaskSize.height + 1};
// allocate device image of appropriatedly reduced size
npp::ImageNPP_8u_C4 oDeviceDst(oSizeROI.width, oSizeROI.height);
// set anchor point inside the mask
NppiPoint oAnchor = {2, 2};
// run box filter
NppStatus eStatusNPP;
eStatusNPP = nppiFilter_8u_C4R(oDeviceSrc.data(), oDeviceSrc.pitch(),
oDeviceDst.data(), oDeviceDst.pitch(),
oSizeROI, pKernel, oMaskSize, oAnchor, nDivisor);
//printf("NppiFilter error status %d\n", eStatusNPP);
NPP_DEBUG_ASSERT(NPP_NO_ERROR == eStatusNPP);
// declare a host image for the result
npp::ImageCPU_8u_C4 oHostDst(oDeviceDst.size());
// and copy the device result data into it
oDeviceDst.copyTo(oHostDst.data(), oHostDst.pitch());
memcpy(data, oHostDst.data(), size * sizeof(Npp8u));
return;
}
Most part of code was copied from example boxFilterNPP.cpp. And the output image: http://img153.imageshack.us/img153/7716/o8z.png
Why it can be?
You have a striding problem. Change this line:
To this:
What is happening?
Let's assume your input image is 600x450.
oHostSrc
is 600 x 450, and the pitch is 600x4 = 2400.memcpy
fromdata
tooHostSrc
is ok because they have the same width and pitch.oDeviceSrc
picks up the size fromoHostSrcc
(600x450)oDeviceDst
is slightly smaller thanoDeviceSrc
, because it only picks up the size of the ROI, so it is something like 596x446.oHostDst
to be the same size asoDeviceDst
, so about 596x446..copyTo
operation copies the oDeviceDst (pitched) 596x446 image to (unpitched)oHostDst
, also 596x446.memcpy
breaks the image, because it is copying a 596x446oHostDst
image to a 600x450data
region.The solution is to create
oHostDst
at 600x450 and let the.copyTo
operation handle the difference in line sizes and pitches.The original code didn't have this problem because there were no unpitched copies anywhere in that code (e.g. no use of raw
memcpy
). As long as you handle the source and destination pitch and width explicitly at every copy step, it does not matter whether you create the final image as 600x450 or 596x446. But your finalmemcpy
operation was not handling pitch and width explicitly, instead it implicitly assumed both source and destination were of the same size, and this was not the case.