Apple have new features in iOS 11 that allows you use vision framework to do object detection without models. I try these new APIs but found the result from VNDetectRectanglesRequest is not good. Am I using the APIs correctly?
Here is some good case:
And some bad case:
Here is my code:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
// create the request
let request2 = VNDetectRectanglesRequest { (request, error) in
self.VNDetectRectanglesRequestCompletionBlock(request: request, error: error)
}
do {
request2.minimumConfidence = 0.7
try self.visionSequenceHandler.perform([request2], on: pixelBuffer)
} catch {
print("Throws: \(error)")
}
}
func VNDetectRectanglesRequestCompletionBlock(request: VNRequest, error: Error?) {
if let array = request.results {
if array.count > 0 {
let ob = array.first as? VNRectangleObservation
print("count: \(array.count)")
print("fps: \(self.measureFPS())")
DispatchQueue.main.async {
let boxRect = ob!.boundingBox
let transRect = self.transformRect(fromRect: boxRect, toViewRect: self.cameraLayer.frame)
var transformedRect = ob!.boundingBox
//transformedRect.origin.y = 1 - transformedRect.origin.y
let convertedRect = self.cameraLayer.layerRectConverted(fromMetadataOutputRect: transformedRect)
self.highlightView?.frame = convertedRect
}
}
}
}
There are a lot of misconception, expectation, and black-box issues that have been brought up already. But aside from that, you’re also using the API incorrectly.
The rectangle detector finds areas in the image that appear to represent real-world rectangular shapes. In most cases, the camera capturing an image sees a real rectangular object in perspective — so its 3D projection onto the 2D image plane will usually not be rectangular. For example, the 2D projection of the computer screen in one of your photos is more trapezoidal, because the top corners are farther from the camera than the bottom corners.
You get this shape by looking at the actual corners of the detected rectangle — see the properties of the VNRectangleObservation object. If you draw lines between those four corners, you’ll usually find something that better tracks the shape of a computer screen, piece of paper, etc in your photo.
The
boundingBox
property instead gets you the smallest rectangular area — that is, rectangular in image space — containing those corner points. So it won’t follow the shape of a real rectangular object unless your camera perspective is just right.