I am trying to extract the text so that it can we OCR processed but these dots add a lot of noise. Image: http://img22.imageshack.us/img22/1344/l0ap.png
Thanks in Advance!
I am trying to extract the text so that it can we OCR processed but these dots add a lot of noise. Image: http://img22.imageshack.us/img22/1344/l0ap.png
Thanks in Advance!
You already have a great solution here, but I still want to add another approach.
1- binary threshold with a very low value
2- find all the contours and list their areas. Fill small contours with white.
3- try OCR, if it doesnt give you a numerical answer, add more preprocessing as such:
while(!ocr)
{
do morphological closing,
do morphological opening,
fill small blobs with white,
try ocr.
}
morphological operations will help you to cut small limbs(dots) out of your blobs (numbers).
I thought this looked like an interesting problem that MSER blob detection and inpainting could solve. Below is some code I tried; but I don't think the result is acceptable for OCR input; but, I've included it anyway incase it might prove useful. The inpainting didn't extend the contour lines of the characters into the mask region in the manner I was hoping it would. I think a more promising approach would be as follows
MSER+inpainting attempt:
Edit: mixed up curvature and bend radius.