I used Tesseract library (3.2.0-alpha2) from nuget. I playing also with older versions and with tessnet2 library and didn`t get any positive results for me. For sample I have 2 images: multiple numbers single number
When I tried recognize multiple numbers I only received number '541' and did not see numbers with single char '0'. When I tried to recognize single number I also did not have any result.
My code sample below:
using (var engine = new TesseractEngine(@"tessdata/", "eng"))
{
engine.SetVariable("tessedit_char_whitelist", "0123456789");
using (var img = Pix.LoadFromFile(@"multiple_numbers.bmp"))
using (var page = engine.Process(img))
using (var iterator = page.GetIterator())
{
Console.WriteLine(page.GetText());
iterator.Begin();
do
{
var text = iterator.GetText(PageIteratorLevel.Word);
Console.WriteLine(int.Parse(text));
}
while (iterator.Next(PageIteratorLevel.Word));
}
}
I played with PageIteratorLevel for iterator, EngineMode for engine and PageSegMode for processing - without any success. Please help me to fix my problem. Main goal to receive all numbers from image. I can change recognition library if I will find simplest way.
Maybe the answer is late, but i will write an answer for anyone having similar problem.
The problem can be solved temporarily by changing the mode to single line of text without searching pages and paragraphs.