I have the above image and would like to slice them into individual questions.
I would like to do it programmatically using python and image libraries.
I have the above image and would like to slice them into individual questions.
I would like to do it programmatically using python and image libraries.
On
You've tagged this as a Python question, but rather than provide code I'll give you a few pointers to start you on your way. If you need help choosing an OCR library and using that library, you'll need to ask additional questions. Your question suggests you're looking for an application approach (okay for StackOverflow) rather than an exhaustive answer along with code.
First, I'd suggest looking for the simplest solution. Maybe that's all you need! If you're not asked to do more than separate the questions, find the most straightforward technique that will do the job (and that you can tolerate debugging).
A few observations:
To keep the problem simple, consider ways you could find these rectangles:
More briefly:
You may find a simpler technique than this, but at the very least this should give you some ideas about solving the problem with OCR, regular expressions, and some geometry.
Good luck!
Finding the four question numbers and first word of text:
The bottom right rectangle:
The top right rectangle:
Left rectangle, which initially overlaps the bottom leftmost question:
Bottom left rectangle
This answer is a good starting point. With some modification of the text separation and if conditions to suite your needs.