I've been trying to get OpenCV ORB feature detection working asynchronously on the GPU, however I'm not getting the results I expect when timing the different function calls. The function detectAndComputeAsync appears to still be blocking the CPU for a significant amount of time.
My code is set up similar to the below:
void detectFeatures(cv::cuda::GpuMat& greyscaleImage)
{
cv::Ptr<cv::cuda::ORB> orb = cv::cuda::ORB::create(300, 1.2f, 8, 31, 0, 2, 0, 31, 20, true);
cv::cuda::Stream stream;
cv::cuda::GpuMat keypoints, descriptors;
m_rmatcher->getFeatureDetector()->detectAndComputeAsync(greyscaleImage, cv::noArray(), keypoints, descriptors, false, stream);
stream.waitForCompletion();
}
I would expect the asynchronous call to defer most of the CPU time to stream.waitForCompletion(), however only around 2 ms is spent in that line, with detectAndComputeAsync still taking around 12 ms. Is there something wrong in my setup, or does detectAndComputeAsync really do a large amount of unavoidable CPU work?
Following this discussion in the openCV forums, it appears that this is expected, and the function detectAndComputeAsync is not truly asynchronous as it blocks on the CPU while synchronising the CUDA calls. https://forum.opencv.org/t/asynchronous-feature-detection-not-running-asynchronously/10889