I am trying to determine the number of lines of text without doing OCR. I want to bypass OCR and give the user an error if they have given too many lines of text to process (It'll take too long and it's not the kind of input that should be given). Ideally, I would like help doing this in python, but if there are any c++ examples that do this, I may be able to adapt them.
Here are the API functions I can work with: http://zdenop.github.io/tesseract-doc/group___advanced_a_p_i.html
I can use these functions, but I don't know a way to deal with BLOCK_LIST, ETEXT_DESC, or Boxa objects in python except to feed them from one API call to another.
Any help would be greatly appreciated!
This may not be the best way, but it works in just a few seconds and allows me to know when I should cancel OCR due to longer than expected execution based on number of symbols found, assuming I put the OCR operation in its own thread that can be killed. You can also find the number of lines (RIL_TEXTLINE), but if you have multiple columns, you'll get a lot more lines as a result.