Tesseract Assert failed trainingsampleset.cpp line 622 with mftraining

1.8k views Asked by At

When mftraining is executed on my training files, I get the following error message:

PS > mftraining -F font_properties -U unicharset -O lang.unicharset .\eng.ds-digita
l.exp0.box.tr .\eng.ds-digitalb.exp0.box.tr .\eng.ds-digitali.exp0.box.tr
Warning: No shape table file present: shapetable
Reading .\eng.ds-digital.exp0.box.tr ...
Reading .\eng.ds-digitalb.exp0.box.tr ...
Reading .\eng.ds-digitali.exp0.box.tr ...
Font id = -1/0, class id = 1/12 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file ..\..\classify\trainingsampleset.cpp, li
ne 622

A dialog from Windows also appears stating "feature training for Tesseract has stopped working". There are several posts around the net adressing this issue, but none of them (That I have tried so far) seems have any solutions to make my data-set go through.

The folder where the mftraining command is executed at contains the following files:

eng.ds-digital.exp0.box
eng.ds-digital.exp0.box.tr
eng.ds-digital.exp0.box.txt
eng.ds-digital.exp0.tif
eng.ds-digitalb.exp0.box
eng.ds-digitalb.exp0.box.tr
eng.ds-digitalb.exp0.box.txt
eng.ds-digitalb.exp0.tif
eng.ds-digitali.exp0.box
eng.ds-digitali.exp0.box.tr
eng.ds-digitali.exp0.box.txt
eng.ds-digitali.exp0.tif
font_properties
unicharset

And the font_properties has the following content (It also ends with a newline as the documentation states):

ds-digital 0 0 0 0 0
ds-digitalb 0 1 0 0 0
ds-digitali 1 0 0 0 0

I've also tried different naming conventions on the font-name on the font_properties (althought the documentation is quite clear it is the font name of the file and not the file name, but some people around the net seems to claim otherwise), and renaming the files so the .tr-files follows the pattern eng.ds-digital*.exp0.tr without anvil.

Edit: I am running on Tesseract 3.02

1

There are 1 answers

0
N.Singh On

I was getting same issue and resolved by checking Font name in eng.ds-digital.exp0.box.tr should be same as you given in font_properties file.

Example: echo "ds-digital 0 0 0 0 0" > font_properties

then eng.ds-digital.exp0.box.tr should have ds-digital font name.

another easy way to train tesseract link.