Extracting total price from a shopping bill

431 views Asked by At

I am working on an application where I need to get the net price displayed in any shopping bill from its picture. I have already retrieved the editable text from the bill images using "tesseract ocr" API. Now I need to print only the "grand total amount" from the text. How do I extract only that part( total price) from a whole bill having the item name, quantity and price?

1

There are 1 answers

0
Pang Ho Ming On

Short answer, I don't think there is a quick/handy method you can call directly.

You need to look into the .hocr file returned from Tesseract(You can google hocr for more info first). The .hocr includes all the bounding box of the text(x, y, width, height, language etc.) then make use of these values, you can determine if words are on the same line (The word 'Total' and the total amount are very likely printed on the same line).

From here you can shortlist the words, add some logical operations (maybe remove all characters/words), then you can get the total value.

ps: My company is working on a similar stuff, but we decided not to use Tesseract, as it is kind of slow and not easy to train (we're dealing with receipts in several languages). We are using Google Vision API.

Hope my answer helps :D