Using DJI video feed with Vision Framework

1.5k views Asked by At

I'm working on an app that uses the video feed from the DJI Mavic 2 and runs it through a machine learning model to identify objects.

I managed to get my app to preview the feed from the drone using this sample DJI project, but I'm having a lot of trouble trying to get the video data into a format that's usable by the Vision framework.

I used this example from Apple as a guide to create my model (which is working!) but it looks I need to create a VNImageRequestHandler object which is created with a cvPixelBuffer of type CMSampleBuffer in order to use Vision.

Any idea how to make this conversion? Is there a better way to do this?

class DJICameraViewController: UIViewController, DJIVideoFeedListener, DJISDKManagerDelegate, DJICameraDelegate, VideoFrameProcessor {

// ...

func videoFeed(_ videoFeed: DJIVideoFeed, didUpdateVideoData rawData: Data) {
    let videoData = rawData as NSData
    let videoBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: videoData.length)
    videoData.getBytes(videoBuffer, length: videoData.length)
    DJIVideoPreviewer.instance().push(videoBuffer, length: Int32(videoData.length))        
}

// MARK: VideoFrameProcessor Protocol Implementation
func videoProcessorEnabled() -> Bool {
    // This is never called
    return true
}

func videoProcessFrame(_ frame: UnsafeMutablePointer<VideoFrameYUV>!) {
    // This is never called
    let pixelBuffer = frame.pointee.cv_pixelbuffer_fastupload as! CVPixelBuffer
    
    let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientationFromDeviceOrientation(), options: [:])
    
    do {
        try imageRequestHandler.perform(self.requests)
    } catch {
        print(error)
    }
}
} // End of DJICameraViewController class

EDIT: from what I've gathered from DJI's (spotty) documentation, it looks like the video feed is compressed H264. They claim the DJIWidget includes helper methods for decompression, but I haven't had success in understanding how to use them correctly because there is no documentation surrounding its use.

EDIT 2: Here's the issue I created on GitHub for the DJIWidget framework

EDIT 3: Updated code snippet with additional methods for VideoFrameProcessor, removing old code from videoFeed method

EDIT 4: Details about how to extract the pixel buffer successfully and utilize it can be found in this comment from GitHub

EDIT 5: It's been years since I worked on this but since there is still some activity here, here's a relevant gist I created to help others. I can't remember specifics around how/why this was relevant, but hopefully it makes sense!

1

There are 1 answers

21
dji-dev-Tim On

The steps :

  1. Call DJIVideoPreviewer’s push:length: method and input the rawData. Inside DJIVideoPreviewer, if you have used VideoPreviewerSDKAdapter please skip this. (H.264 parsing and decoding steps will be performed once you do this.)

  2. Conform to the VideoFrameProcessor protocol and call DJIVideoPreviewer.registFrameProcessor to register the VideoFrameProcessor protocol object.

  3. VideoFrameProcessor protocol’s videoProcessFrame: method will output the VideoFrameYUV data.

  4. Get the CVPixelBuffer data. VideoFrameYUV struct has a cv_pixelbuffer_fastupload field, this data is actually of type CVPixelBuffer when hardware decoding is turned on. If you are using software decoding, you will need to create a CVPixelBuffer yourself and copy the data from the VideoFrameYUV's luma, chromaB and chromaR field.


Code:

VideoFrameYUV* yuvFrame; // the VideoFrameProcessor output
CVPixelBufferRef pixelBuffer = NULL;
CVReturn resulst = CVPixelBufferCreate(kCFAllocatorDefault,
                                       yuvFrame-> width,
                                       yuvFrame -> height, 
                                  kCVPixelFormatType_420YpCbCr8Planar,
                                       NULL,
                                       &pixelBuffer);
if (kCVReturnSuccess != CVPixelBufferLockBaseAddress(pixelBuffer, 0) || pixelBuffer == NULL) {
    return;
}
long yPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0);
long yPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer,0);
long uPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 1);
long uPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, 1);
long vPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 2);
long vPlaneHeight =  CVPixelBufferGetHeightOfPlane(pixelBuffer, 2);
uint8_t* yDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
memcpy(yDestination, yuvFrame->luma, yPlaneWidth * yPlaneHeight);
uint8_t* uDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
memcpy(uDestination, yuvFrame->chromaB, uPlaneWidth * uPlaneHeight);
uint8_t* vDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 2);
memcpy(vDestination, yuvFrame->chromaR, vPlaneWidth * vPlaneHeight);
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);