i am trying to train tesseract for that i am following this How to Create Traineddata file For Tesseract 4.1.0
step 1 tesseract eng.ocrb.exp0.jpeg eng.ocrb.exp0 batch.nochop makebox
step 2 tesseract eng.ocrb.exp0.jpeg eng.ocrb.exp0 nobatch box.train
step 3 unicharset_extractor eng.ocrb.exp0.box
step 4 echo “ocrb 0 0 1 0 0” > font_properties
step 5 mftraining –F font_properties –U unicharset –O unicharset eng.ocrb.exp0.tr
step 6 cntraining eng.ocrb.exp0.tr
step 7 mv shapetable eng.shapetable mv inttemp eng.inttemp mv pffmtable eng.pffmtable mv normproto eng.normproto
step 8 combine_tessdata eng.
and here are the outputs of each command
step 1
tesseract eng.ocrb.exp0.jpeg eng.ocrb.exp0 batch.nochop makebox
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 141
step 2
tesseract eng.ocrb.exp0.jpeg eng.ocrb.exp0 nobatch box.train
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 141
APPLY_BOXES:
Boxes read from boxfile: 90
Found 90 good blobs.
Generated training data for 3 words
step 3
unicharset_extractor eng.ocrb.exp0.box
Extracting unicharset from box file eng.ocrb.exp0.box
Other case i of I is not in unicharset
Other case d of D is not in unicharset
Other case a of A is not in unicharset
Other case u of U is not in unicharset
Other case t of T is not in unicharset
Mirror > of < is not in unicharset
Other case f of F is not in unicharset
Other case m of M is not in unicharset
Other case s of S is not in unicharset
Other case e of E is not in unicharset
Other case r of R is not in unicharset
Other case o of O is not in unicharset
Other case l of L is not in unicharset
Wrote unicharset file unicharset
step 4
echo “ocrb 0 0 1 0 0” > font_properties
step 5
mftraining –F font_properties –U unicharset –O unicharset eng.ocrb.exp0.tr
Warning: No shape table file present: shapetable
Reading –F ...
Failed to open tr file: –F
Reading font_properties ...
Bad box coordinates in boxfile string! 0 0 1 0 0”
Bad format in tr file, reading box coords
Reading –U ...
Failed to open tr file: –U
Reading unicharset ...
Bad format in tr file, reading fontname, unichar
Bad box coordinates in boxfile string! 0 Common 0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 I # I [49 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 4 0 4 D # D [44 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 5 0 5 A # A [41 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 6 0 6 U # U [55 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 7 0 7 T # T [54 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 8 2 8 1 # 1 [31 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 9 2 9 0 # 0 [30 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 10 2 10 9 # 9 [39 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 0 0,255,0,255,0,0,0,0,0,0 Common 11 10 0 < # < [3c ]
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 12 2 12 6 # 6 [36 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 13 2 13 7 # 7 [37 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 14 2 14 4 # 4 [34 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 15 0 15 F # F [46 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 16 2 16 2 # 2 [32 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 17 2 17 3 # 3 [33 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 18 2 18 5 # 5 [35 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 19 0 19 M # M [4d ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 20 0 20 S # S [53 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 21 0 21 E # E [45 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 22 0 22 R # R [52 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 23 0 23 O # O [4f ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 24 0 24 L # L [4c ]A
Bad format in tr file, reading box coords
Reading –O ...
Failed to open tr file: –O
Reading unicharset ...
Bad format in tr file, reading fontname, unichar
Bad box coordinates in boxfile string! 0 Common 0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 I # I [49 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 4 0 4 D # D [44 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 5 0 5 A # A [41 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 6 0 6 U # U [55 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 7 0 7 T # T [54 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 8 2 8 1 # 1 [31 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 9 2 9 0 # 0 [30 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 10 2 10 9 # 9 [39 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 0 0,255,0,255,0,0,0,0,0,0 Common 11 10 0 < # < [3c ]
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 12 2 12 6 # 6 [36 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 13 2 13 7 # 7 [37 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 14 2 14 4 # 4 [34 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 15 0 15 F # F [46 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 16 2 16 2 # 2 [32 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 17 2 17 3 # 3 [33 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 18 2 18 5 # 5 [35 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 19 0 19 M # M [4d ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 20 0 20 S # S [53 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 21 0 21 E # E [45 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 22 0 22 R # R [52 ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 23 0 23 O # O [4f ]A
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 24 0 24 L # L [4c ]A
Bad format in tr file, reading box coords
Reading eng.ocrb.exp0.tr ...
Flat shape table summary: Number of shapes = 22 max unichars = 1 number with multiple unichars = 0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()
step 6
cntraining eng.ocrb.exp0.tr
Reading eng.ocrb.exp0.tr ...
Clustering ...
Writing normproto ...
step 7
mv shapetable eng.shapetable
mv inttemp eng.inttemp
mv pffmtable eng.pffmtable
mv normproto eng.normproto
step 8
combine_tessdata eng.
Combining tessdata files
Error: traineddata file must contain at least (a unicharset fileand inttemp) OR an lstm file.
Error combining tessdata files into eng.traineddata
Version string:4.1.1
3:inttemp:size=163171, offset=192
4:pffmtable:size=194, offset=163363
5:normproto:size=2822, offset=163557
13:shapetable:size=400, offset=166379
23:version:size=5, offset=166779
here are the files details of my current directory
ls -l
total 296
-rw-rw-r-- 1 sara sara 163171 فروری 23 17:58 eng.inttemp
-rw-rw-r-- 1 sara sara 2822 فروری 23 18:13 eng.normproto
-rw-rw-r-- 1 sara sara 1575 فروری 23 16:55 eng.ocrb.exp0.box
-rw-rw-r-- 1 sara sara 13806 جنوری 27 18:03 eng.ocrb.exp0.jpeg
-rw-rw-r-- 1 sara sara 96509 فروری 23 17:55 eng.ocrb.exp0.tr
-rw-rw-r-- 1 sara sara 194 فروری 23 17:58 eng.pffmtable
-rw-rw-r-- 1 sara sara 400 فروری 23 17:58 eng.shapetable
-rw-rw-r-- 1 sara sara 21 فروری 23 17:57 font_properties
-rw-rw-r-- 1 sara sara 1381 فروری 23 17:57 unicharset
followed these tutorials as well
https://www.youtube.com/watch?v=1v8BPw0Dn0I https://pretius.com/blog/ocr-tesseract-training-data/
i am using qt-box-editor and ubuntu 20.04 lts the corrected box coordinates in qt box editor
