When mftraining is executed on my training files, I get the following error message:
PS > mftraining -F font_properties -U unicharset -O lang.unicharset .\eng.ds-digita
l.exp0.box.tr .\eng.ds-digitalb.exp0.box.tr .\eng.ds-digitali.exp0.box.tr
Warning: No shape table file present: shapetable
Reading .\eng.ds-digital.exp0.box.tr ...
Reading .\eng.ds-digitalb.exp0.box.tr ...
Reading .\eng.ds-digitali.exp0.box.tr ...
Font id = -1/0, class id = 1/12 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file ..\..\classify\trainingsampleset.cpp, li
ne 622
A dialog from Windows also appears stating "feature training for Tesseract has stopped working". There are several posts around the net adressing this issue, but none of them (That I have tried so far) seems have any solutions to make my data-set go through.
The folder where the mftraining command is executed at contains the following files:
eng.ds-digital.exp0.box
eng.ds-digital.exp0.box.tr
eng.ds-digital.exp0.box.txt
eng.ds-digital.exp0.tif
eng.ds-digitalb.exp0.box
eng.ds-digitalb.exp0.box.tr
eng.ds-digitalb.exp0.box.txt
eng.ds-digitalb.exp0.tif
eng.ds-digitali.exp0.box
eng.ds-digitali.exp0.box.tr
eng.ds-digitali.exp0.box.txt
eng.ds-digitali.exp0.tif
font_properties
unicharset
And the font_properties has the following content (It also ends with a newline as the documentation states):
ds-digital 0 0 0 0 0
ds-digitalb 0 1 0 0 0
ds-digitali 1 0 0 0 0
I've also tried different naming conventions on the font-name on the font_properties (althought the documentation is quite clear it is the font name of the file and not the file name, but some people around the net seems to claim otherwise), and renaming the files so the .tr-files follows the pattern eng.ds-digital*.exp0.tr without anvil.
Edit: I am running on Tesseract 3.02
I was getting same issue and resolved by checking Font name in
eng.ds-digital.exp0.box.tr
should be same as you given infont_properties
file.Example:
echo "ds-digital 0 0 0 0 0" > font_properties
then
eng.ds-digital.exp0.box.tr
should haveds-digital
font name.another easy way to train tesseract link.