iOS 11 using vision framework VNDetectRectanglesRequest to do object detection not precisely?

7.6k views Asked by At

Apple have new features in iOS 11 that allows you use vision framework to do object detection without models. I try these new APIs but found the result from VNDetectRectanglesRequest is not good. Am I using the APIs correctly?

Here is some good case:

enter image description here

enter image description here

And some bad case:

enter image description here

Here is my code:

 func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)

        // create the request

        let request2 = VNDetectRectanglesRequest { (request, error) in
            self.VNDetectRectanglesRequestCompletionBlock(request: request, error: error)
        }

        do {
            request2.minimumConfidence = 0.7
            try self.visionSequenceHandler.perform([request2], on: pixelBuffer)
        } catch {
            print("Throws: \(error)")
        }
    }


func VNDetectRectanglesRequestCompletionBlock(request: VNRequest, error: Error?) {
        if let array = request.results {
            if array.count > 0 {
                let ob = array.first as? VNRectangleObservation
                print("count: \(array.count)")
                print("fps: \(self.measureFPS())")
                DispatchQueue.main.async {
                    let boxRect = ob!.boundingBox
                    let transRect = self.transformRect(fromRect: boxRect, toViewRect: self.cameraLayer.frame)
                    var transformedRect = ob!.boundingBox
                    //transformedRect.origin.y = 1 - transformedRect.origin.y
                    let convertedRect = self.cameraLayer.layerRectConverted(fromMetadataOutputRect: transformedRect)

                    self.highlightView?.frame = convertedRect

                }
            }
        }
    }
2

There are 2 answers

0
rickster On BEST ANSWER

There are a lot of misconception, expectation, and black-box issues that have been brought up already. But aside from that, you’re also using the API incorrectly.

The rectangle detector finds areas in the image that appear to represent real-world rectangular shapes. In most cases, the camera capturing an image sees a real rectangular object in perspective — so its 3D projection onto the 2D image plane will usually not be rectangular. For example, the 2D projection of the computer screen in one of your photos is more trapezoidal, because the top corners are farther from the camera than the bottom corners.

You get this shape by looking at the actual corners of the detected rectangle — see the properties of the VNRectangleObservation object. If you draw lines between those four corners, you’ll usually find something that better tracks the shape of a computer screen, piece of paper, etc in your photo.

The boundingBox property instead gets you the smallest rectangular area — that is, rectangular in image space — containing those corner points. So it won’t follow the shape of a real rectangular object unless your camera perspective is just right.

0
broder On

Your commented out line is almost right, you need to put that back but change it to:

transformedRect.origin.y = 1 - (transformedRect.origin.y + transformedRect.width)

Your 'bad case' example the square is actually from the soft toy on the right. Your good ones look right because they are in the centre of the screen.