Tesseract: Recognition simple numbers in C#

5.9k views Asked by At

I used Tesseract library (3.2.0-alpha2) from nuget. I playing also with older versions and with tessnet2 library and didn`t get any positive results for me. For sample I have 2 images: multiple numbers single number

When I tried recognize multiple numbers I only received number '541' and did not see numbers with single char '0'. When I tried to recognize single number I also did not have any result.

My code sample below:

        using (var engine = new TesseractEngine(@"tessdata/", "eng"))
        {
            engine.SetVariable("tessedit_char_whitelist", "0123456789");

            using (var img = Pix.LoadFromFile(@"multiple_numbers.bmp"))
            using (var page = engine.Process(img))
            using (var iterator = page.GetIterator())
            {
                Console.WriteLine(page.GetText()); 
                iterator.Begin();

                do
                {
                    var text = iterator.GetText(PageIteratorLevel.Word);
                    Console.WriteLine(int.Parse(text));
                }
                while (iterator.Next(PageIteratorLevel.Word));
            }
        }

I played with PageIteratorLevel for iterator, EngineMode for engine and PageSegMode for processing - without any success. Please help me to fix my problem. Main goal to receive all numbers from image. I can change recognition library if I will find simplest way.

1

There are 1 answers

0
Khai Vu On

Maybe the answer is late, but i will write an answer for anyone having similar problem.

The problem can be solved temporarily by changing the mode to single line of text without searching pages and paragraphs.

using var engine= new TesseractEngine("LanguageDataFolder", "eng", EngineMode.Default);
engine.DefaultPageSegMode = PageSegMode.SingleBlock; // <= this line