I am using HOG feature detector based on SVM classification. I can successfully extract license plate, but the extracted number plate have some unnecessary pixels/lines apart from license number. My image processing pipeline is as follows:
- Applying HOG detector on the grayscale image
- Cropping detected region
- Re-sizing the cropped image
Applying adaptive threshold to highlight the plate numbers & filtering background using following Opencv code
cvAdaptiveThreshold(cropped_plate, thresholded_plate, 255,CV_ADAPTIVE_THRESH_GAUSSIAN_C, CV_THRESH_BINARY_INV,11, 9);
De-skewing plate image
Due to this unnecessary information, Tesseract-OCR software is getting confused to recognize numbers correctly. The extracted number plates images look like the following.
How can i filter these unnecessary pixels/lines from the images? Any help will be appreciated.
You want to remove all non-text objects in the image. To do that, I suggest sorting the blobs by area of their bounding box (maxy - miny)*(maxx - minx). Do some statistical analysis; you know you are looking for objects of a similar size. Once you identify the approximate size of a character, make a larger bounding box that estimates the whole text. Keep the small blobs inside it, so for your picture, the dash sign will be preserved.