How to extract table data from an image using Vision Framework in iOS?

645 views Asked by At

With iOS Vision Framework, I am able to perform OCR and fetch recognized text from an image using

VNRecognizedTextObservation

Now let say, I have an image in which there is some text paragraph along with a table. The table has many columns and associated rows with it(Refer below image). Is it possible to recognize a particular column's key and values from the table using Vision?

For example, I want to fetch 2014 Retail Sales numbers alone from the below image using Vision. How to do this? Can we use both Vision and CoreML to do this?

enter image description here

1

There are 1 answers

0
Surajkumbhar904 On

Yes it is possible using the vision

 guard let cgImage = image.cgImage else { return }
    let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage, orientation: .right)
    
    let size = CGSize(width: cgImage.width, height: cgImage.height) // note, in pixels from `cgImage`; this assumes you have already rotate, too
    let bounds = CGRect(origin: .zero, size: size)
    // Create a new request to recognize text.
    let request = VNRecognizeTextRequest { [self] request, error in
        guard
            let results = request.results as? [VNRecognizedTextObservation],
            error == nil
        else { return }
        
        let rects = results.map {
            convert(boundingBox: $0.boundingBox, to: CGRect(origin: .zero, size: size))
        }

    func convert(boundingBox: CGRect, to bounds: CGRect) -> CGRect {
    let imageWidth = bounds.width
    let imageHeight = bounds.height
    
    // Begin with input rect.
    var rect = boundingBox
    
    // Reposition origin.
    rect.origin.x *= imageWidth
    rect.origin.x += bounds.minX
    rect.origin.y = (1 - rect.maxY) * imageHeight + bounds.minY
    
    // Rescale normalized coordinates.
    rect.size.width *= imageWidth
    rect.size.height *= imageHeight
    
    return rect
}

you can detect the column's key bounding box and extend the bounding box height

 var targetBoundingBox: CGRect?
        var targetWord = "Retailer"
        
        for result in results {
            if let candidate = result.topCandidates(1).first, candidate.string.lowercased().contains(targetWord.lowercased()) {
                targetBoundingBox = convert(boundingBox: result.boundingBox, to: CGRect(origin: .zero, size: size))

                targetBoundingBox?.size.height += bounds.maxY
                break
            }
        }

        if let targetBoundingBox = targetBoundingBox {
            print("Bounding box of '\(targetWord)': \(targetBoundingBox)")
            var textInsideTargetBox = ""
            for result in results {
                let boundingBox = convert(boundingBox: result.boundingBox, to: CGRect(origin: .zero, size: size))
                if targetBoundingBox.intersects(boundingBox), let text = result.topCandidates(1).first?.string {
                    textInsideTargetBox += "\(text) "
                }
            }
            print(textInsideTargetBox,"string")
            let format = UIGraphicsImageRendererFormat()
            format.scale = 1
            let final = UIGraphicsImageRenderer(bounds: bounds, format: format).image { _ in
                image.draw(in: bounds)
                UIColor.green.setStroke()
                //                    for rect in rects {
                let path = UIBezierPath(rect: targetBoundingBox)
                path.lineWidth = 9
                path.stroke()
                //                    }
            }
            DispatchQueue.main.async { [self] in
                resultImage.image = final
            }

        } else {
            print("Bounding box of '\(targetWord)' not found")

        }