I have a C# & .NET application that uses a GPU (NVIDA GTX980) to do image processing. There are 4 stages and I synch the CPU to the GPU (no overlap in time) to do timing. But the numbers do not add up.
Launch() will do a async launch of the GPU kernel) but synchronize() will wait till it is done.
- Total: tThreshold: 4.2827ms
- tHistogram: 3.7714ms
- tHistogramSum: 0.1065ms
- tIQR: 3.8603ms
- tThresholdOnly: 0.4126ms
What is going on?
public static void threshold()
{
Stopwatch watch = new Stopwatch();
watch.Start();
gpu.Lock();
dim3 block = new dim3(tileWidthBig, tileHeightBig);
dim3 grid = new dim3(Frame.width / tileWidthBig, Frame.height / tileHeightBig);
gpu.Launch(grid, block).gHistogram(gForeground, gPercentile, gInfo);
gpu.Synchronize();
tHistogram = watch.Elapsed.TotalMilliseconds;
block = new dim3(1024);
grid = new dim3(1);
gpu.Launch(grid, block).gSumHistogram(gPercentile);
gpu.Synchronize();
tHistogramSum = watch.Elapsed.TotalMilliseconds - tHistogram;
gpu.Launch(grid, block).gIQR(gPercentile, gInfo);
gpu.Synchronize();
tIQR = watch.Elapsed.TotalMilliseconds - tHistogramSum;
block = new dim3(256, 4);
grid = new dim3(Frame.width / 256, Frame.height / 4);
gpu.Launch(grid, block).gThreshold(gForeground, gMask, gInfo);
gpu.Synchronize();
tThresholdOnly = watch.Elapsed.TotalMilliseconds - tIQR;
gpu.Unlock();
watch.Stop();
tThreshold = watch.Elapsed.TotalMilliseconds;
}
As the TotalMilliseconds is constantly incrementing & you are trying to find differences between points in time, you need to subtract the sum of the preceding differences after the second one, hence :
&