Image or PDF may contains
**
- Printed text,
- Handwritten text,
- Paragraphs,
- Key value pairs,
- Complex Tables.
**
While training, we will assign the tags/keywords for the document. When testing will look for the tag and read the result for the tag.
Image or PDF may contains
**
**
While training, we will assign the tags/keywords for the document. When testing will look for the tag and read the result for the tag.
You need to do 3 steps:
First of all you should write basic object recognition algorithm for an image. The algorithm must crop your image to the ROIs (regions of interests), then it should classify each ROI by elements from your content type list. For this part you can use some heuristics rules (tables, for example, sometimes has a rectangle boundaries) to get ROIs features. Then you may use a lightweight classifier like a decision tree.
Next you should provide algorithm for reading your data structure defined by a ROI type. For example, for table you should find all cells at the image. Then you need to find each word or number from your data structure and crop it to the symbols sets.
When you have do it, you will have to classify each symbol by your text-image classifier. On this step, you can use a Multilayer Perceptron for example or Bayesian Naive Classifier, and another type of classifiers which usually used for image recognition.
In the practice, you could try OpenCV library, which already has almost all algorithms you need to do your stuff.
For better understanding of 3rd step you could watch my project for captcha recognizing based on OpenCV Artificial Neural Network feature usage.