Compute histogram for CVPixelBuffer using vImage

52 views Asked by At

I have CVPixelBuffers in YCbCr (422 or 420 biplanar 10 bit video range) coming from camera. I see vImage framework is sophisticated enough to handle a variety of image formats (including pixel buffers in various YCbCr formats). I was looking to compute histograms for both Y (luma) and RGB. For the 8 bit YCbCr samples, I could use this code to compute histogram of Y component.

            var alphaBin = [vImagePixelCount](repeating: 0, count: Int(numBins))

            CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
            
            let bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
            let baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
            let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, 0)
            let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0)
            
            CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
            
            var buffer = vImage_Buffer(data: baseAddress, height: vImagePixelCount( height), width: vImagePixelCount(width), rowBytes: bytesPerRow)
            
            alphaBin.withUnsafeMutableBufferPointer { alphaPtr in
                let error = vImageHistogramCalculation_Planar8(&buffer, alphaPtr.baseAddress!, UInt32(kvImageNoFlags))
                
                guard error == kvImageNoError else {
                    fatalError("Error calculating histogram luma: \(error)")
                }
                
            }

How does one implement the same for 10 bit HDR pixel buffers, preferably using new iOS 16 vImage APIs that provide lot more flexibility (for instance, getting RGB histogram as well from YCbCr sample without explicitly performing pixel format conversion)?

1

There are 1 answers

0
Ian Ollmann On

As you can see in vImage/Histogram.h, there are (currently) only histogram functionality for 1- and 4-channel 8-bit and float images. Conceivably you could convert your 10-bit signal into a sequence of planar FP images and set the number of histogram entries to 2**10, but alas, I think this is not going to perform all that well.

This isn't the end of the world though. It really isn't possible to vectorize histograms with today's SIMD ISA. AVX-512 has a few tricks, but I'm not sure whether or not they actually pay off with observed performance improvement -- haven't tried. Otherwise, it is for the most part an inherently scalar process because the reads and writes to the histogram can not be grouped together into vectors, and for a 10-bit image at least, unless you are planning to convert it to a perceptual colorspace first or something, there isn't much arithmetic to be done on the color values. Read them, find the bin, increment it. The vImage histogram routines are there for completeness, not necessarily because there was some amazing AltiVec (or now SSE/Neon) secret sauce to make them fly superfast.

For this reason, if you just write your own unoptimized scalar code, it should perform reasonably well compared to a hypothetical vImage function. This is also probably why the vImage team has not gone to town with histograms on all sorts of formats. A skilled practitioner can possibly get as much as another factor of two performance over naive scalar code with software pipelining or vector pixel loads to keep the histogram memory accesses from stomping on one another, but it isn't going to be the performance improvement you typically see elsewhere in vImage.

You might see some improvement also with dispatch_apply() to process the image concurrently in strips and add the histograms together at the end, but that also depends on how much the performance is bandwidth bound. Once you have written the scalar code to process the whole image, breaking it up for dispatch_apply() is pretty easy and you can do some benchmarking.