VNDocumentCameraViewController disable Auto Scan

896 views Asked by At

VNDocumentCameraViewController scans the documents automatically. I want to scan documents only when user taps shutter button. Is there any way to accomplish this?

1

There are 1 answers

0
Faisal Memon On

It is not possible to do it directly because VNDocumentCameraViewController does not offer this as a public API.

However, it is possible to implement an equivalent experience where you have a camera view and then press a 'Capture' button which immediately takes a photo and processes it for Text scanning.

The code is quite lengthy, so I leveraged the work of Guillermo Moraleda and Andy Ibanez to produce a working prototype DisableAutoScan but the relevant portions are shown below.

Use the 'Manual' menu button to show the on-shutter experience, and the 'Auto' menu button to show the VNDocumentCameraViewController experience.

In essence the solution involves using AVFoundation classes to create a camera capture view session, and then when a Capture button is pressed, to extract the current image from the AVCapturePhotoOutput and then this can be fed into the VNRecognizeTextRequest to get the text capture on demand.

Here is the code which captures the photo on-demand from a live camera view:

//
//  CameraCaptureUIViewController.swift
//  TextRecognizer
//
//  Created by Faisal Memon on 23/02/2023.
//  Copyright © 2023 Guillermo Moraleda. All rights reserved.
//

import UIKit
import AVFoundation

class CameraCaptureUIViewController: UIViewController, AVCapturePhotoCaptureDelegate {

    var captureSession: AVCaptureSession!
    var stillImageOutput: AVCapturePhotoOutput!
    var videoPreviewLayer: AVCaptureVideoPreviewLayer!
    var capturedImage: UIImage?
    
    var imageWasCapturedClosure: ((UIImage) -> Void)?
    
    @IBOutlet weak var cameraCaptureView: UIView!
    
    @IBAction func capturePressed(_ sender: Any) {
        let settings = AVCapturePhotoSettings(format: [AVVideoCodecKey: AVVideoCodecType.jpeg])
        stillImageOutput.capturePhoto(with: settings, delegate: self)
    }
    
    override func viewDidLoad() {
        super.viewDidLoad()
    }
    
    override func viewDidAppear(_ animated: Bool) {
        super.viewDidAppear(animated)
        captureSession = AVCaptureSession()
        captureSession.sessionPreset = .medium
        guard let backCamera = AVCaptureDevice.default(for: AVMediaType.video)
            else {
                print("Unable to access back camera!")
                return
        }
        do {
            let input = try AVCaptureDeviceInput(device: backCamera)
            stillImageOutput = AVCapturePhotoOutput()
            if captureSession.canAddInput(input) && captureSession.canAddOutput(stillImageOutput) {
                captureSession.addInput(input)
                captureSession.addOutput(stillImageOutput)
                setupLivePreview()
            }
        }
        catch let error  {
            print("Error Unable to initialize back camera:  \(error.localizedDescription)")
        }
    }
    
    override func viewWillDisappear(_ animated: Bool) {
        super.viewWillDisappear(animated)
        self.captureSession.stopRunning()
    }
    
    func setupLivePreview() {
        
        videoPreviewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
        
        videoPreviewLayer.videoGravity = .resizeAspect
        videoPreviewLayer.connection?.videoOrientation = .portrait
        cameraCaptureView.layer.addSublayer(videoPreviewLayer)
        
        DispatchQueue.global(qos: .userInitiated).async {
            self.captureSession.startRunning()
            DispatchQueue.main.async {
                self.videoPreviewLayer.frame = self.cameraCaptureView.bounds
            }
        }
    }
    
    func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) {
        
        guard let imageData = photo.fileDataRepresentation()
            else { return }
        
        capturedImage = UIImage(data: imageData)
        if let gotImage = capturedImage {
            imageWasCapturedClosure?(gotImage)
        }
        self.navigationController?.popViewController(animated: true)
        
    }
}

On the other side of it is the closure imageWasCapturedClosure set in the calling ViewController. In my example project I have:

override func prepare(for segue: UIStoryboardSegue, sender: Any?) {
        if let cameraCaptureVC = segue.destination as? CameraCaptureUIViewController {
            cameraCaptureVC.imageWasCapturedClosure = { image in
                self.scannedImaged = image
            }
        }
    }

private var scannedImaged: UIImage = UIImage() {
        didSet {
            imageView.image = scannedImaged
            recognizeText()
        }
    }

func recognizeText() {
        guard let imageData = scannedImaged.pngData() else { return }
        
        DispatchQueue.global(qos: .userInitiated).async {
            let request = VNRecognizeTextRequest()
            let requests = [request]
            
            let handler = VNImageRequestHandler(data: imageData)
            try? handler.perform(requests)
            
            guard let observations = request.results as? [VNRecognizedTextObservation] else { fatalError("Wrong observation received") }
            
            var recognizedText = ""
            for observation in observations {
                guard let bestCandidate = observation.topCandidates(1).first else { continue }
                
                recognizedText += " \(bestCandidate.string)"
            }
            
            DispatchQueue.main.async {
                self.recognizedText = recognizedText
            }
        }
    }

Note on user interaction

I initially found the auto-capture feature of the document camera view controller annoying but it is actually quite useful as you become a heavy user due to the ergonomics of the interaction.

When you have a lot of scans to do, one hand points the camera and the other shows paper to scan, and moving the phone from pointing at the paper to pointing at a dead zone (say your stomach) and back allows you to move to the next page without an unplanned image capture. This means the scanning is very effective and quick.

The experience you desire is more helpful to occasional scanners where a step-by-step and deliberate control is more natural/logical due to it being an unfamiliar or occasional task.