I'm working on an app that uses the video feed from the DJI Mavic 2 and runs it through a machine learning model to identify objects.
I managed to get my app to preview the feed from the drone using this sample DJI project, but I'm having a lot of trouble trying to get the video data into a format that's usable by the Vision
framework.
I used this example from Apple as a guide to create my model (which is working!) but it looks I need to create a VNImageRequestHandler
object which is created with a cvPixelBuffer
of type CMSampleBuffer
in order to use Vision
.
Any idea how to make this conversion? Is there a better way to do this?
class DJICameraViewController: UIViewController, DJIVideoFeedListener, DJISDKManagerDelegate, DJICameraDelegate, VideoFrameProcessor {
// ...
func videoFeed(_ videoFeed: DJIVideoFeed, didUpdateVideoData rawData: Data) {
let videoData = rawData as NSData
let videoBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: videoData.length)
videoData.getBytes(videoBuffer, length: videoData.length)
DJIVideoPreviewer.instance().push(videoBuffer, length: Int32(videoData.length))
}
// MARK: VideoFrameProcessor Protocol Implementation
func videoProcessorEnabled() -> Bool {
// This is never called
return true
}
func videoProcessFrame(_ frame: UnsafeMutablePointer<VideoFrameYUV>!) {
// This is never called
let pixelBuffer = frame.pointee.cv_pixelbuffer_fastupload as! CVPixelBuffer
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientationFromDeviceOrientation(), options: [:])
do {
try imageRequestHandler.perform(self.requests)
} catch {
print(error)
}
}
} // End of DJICameraViewController class
EDIT: from what I've gathered from DJI's (spotty) documentation, it looks like the video feed is compressed H264. They claim the DJIWidget
includes helper methods for decompression, but I haven't had success in understanding how to use them correctly because there is no documentation surrounding its use.
EDIT 2: Here's the issue I created on GitHub for the DJIWidget framework
EDIT 3: Updated code snippet with additional methods for VideoFrameProcessor
, removing old code from videoFeed
method
EDIT 4: Details about how to extract the pixel buffer successfully and utilize it can be found in this comment from GitHub
EDIT 5: It's been years since I worked on this but since there is still some activity here, here's a relevant gist I created to help others. I can't remember specifics around how/why this was relevant, but hopefully it makes sense!
The steps :
Call
DJIVideoPreviewer
’spush:length:
method and input therawData
. InsideDJIVideoPreviewer
, if you have usedVideoPreviewerSDKAdapter
please skip this. (H.264 parsing and decoding steps will be performed once you do this.)Conform to the
VideoFrameProcessor
protocol and callDJIVideoPreviewer.registFrameProcessor
to register theVideoFrameProcessor
protocol object.VideoFrameProcessor
protocol’svideoProcessFrame:
method will output theVideoFrameYUV
data.Get the
CVPixelBuffer
data.VideoFrameYUV
struct has acv_pixelbuffer_fastupload
field, this data is actually of typeCVPixelBuffer
when hardware decoding is turned on. If you are using software decoding, you will need to create aCVPixelBuffer
yourself and copy the data from theVideoFrameYUV
'sluma
,chromaB
andchromaR
field.Code: