tesseract combine_tessdata eng. Combining tessdata files Error: traineddata file must contain at least (a unicharset file

44 views Asked by At

i am trying to train tesseract for that i am following this How to Create Traineddata file For Tesseract 4.1.0

step 1 tesseract eng.ocrb.exp0.jpeg eng.ocrb.exp0 batch.nochop makebox

step 2 tesseract eng.ocrb.exp0.jpeg eng.ocrb.exp0 nobatch box.train

step 3 unicharset_extractor eng.ocrb.exp0.box

step 4 echo “ocrb 0 0 1 0 0” > font_properties

step 5 mftraining –F font_properties –U unicharset –O unicharset eng.ocrb.exp0.tr

step 6 cntraining eng.ocrb.exp0.tr

step 7 mv shapetable eng.shapetable mv inttemp eng.inttemp mv pffmtable eng.pffmtable mv normproto eng.normproto

step 8 combine_tessdata eng.

and here are the outputs of each command

step 1

tesseract eng.ocrb.exp0.jpeg eng.ocrb.exp0 batch.nochop makebox
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 141

step 2

tesseract eng.ocrb.exp0.jpeg eng.ocrb.exp0 nobatch box.train
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 141
APPLY_BOXES:
   Boxes read from boxfile:      90
   Found 90 good blobs.
Generated training data for 3 words

step 3

unicharset_extractor eng.ocrb.exp0.box
Extracting unicharset from box file eng.ocrb.exp0.box
Other case i of I is not in unicharset
Other case d of D is not in unicharset
Other case a of A is not in unicharset
Other case u of U is not in unicharset
Other case t of T is not in unicharset
Mirror > of < is not in unicharset
Other case f of F is not in unicharset
Other case m of M is not in unicharset
Other case s of S is not in unicharset
Other case e of E is not in unicharset
Other case r of R is not in unicharset
Other case o of O is not in unicharset
Other case l of L is not in unicharset
Wrote unicharset file unicharset

step 4

echo “ocrb 0 0 1 0 0” > font_properties

step 5

mftraining –F font_properties –U unicharset –O unicharset eng.ocrb.exp0.tr
Warning: No shape table file present: shapetable
Reading –F ...
Failed to open tr file: –F
Reading font_properties ...
Bad box coordinates in boxfile string! 0 0 1 0 0”

Bad format in tr file, reading box coords
Reading –U ...
Failed to open tr file: –U
Reading unicharset ...
Bad format in tr file, reading fontname, unichar
Bad box coordinates in boxfile string! 0 Common 0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 I  # I [49 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 4 0 4 D  # D [44 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 5 0 5 A  # A [41 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 6 0 6 U  # U [55 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 7 0 7 T  # T [54 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 8 2 8 1 # 1 [31 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 9 2 9 0 # 0 [30 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 10 2 10 9   # 9 [39 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 0 0,255,0,255,0,0,0,0,0,0 Common 11 10 0 <   # < [3c ]

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 12 2 12 6   # 6 [36 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 13 2 13 7   # 7 [37 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 14 2 14 4   # 4 [34 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 15 0 15 F    # F [46 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 16 2 16 2   # 2 [32 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 17 2 17 3   # 3 [33 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 18 2 18 5   # 5 [35 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 19 0 19 M    # M [4d ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 20 0 20 S    # S [53 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 21 0 21 E    # E [45 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 22 0 22 R    # R [52 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 23 0 23 O    # O [4f ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 24 0 24 L    # L [4c ]A

Bad format in tr file, reading box coords
Reading –O ...
Failed to open tr file: –O
Reading unicharset ...
Bad format in tr file, reading fontname, unichar
Bad box coordinates in boxfile string! 0 Common 0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 I  # I [49 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 4 0 4 D  # D [44 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 5 0 5 A  # A [41 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 6 0 6 U  # U [55 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 7 0 7 T  # T [54 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 8 2 8 1 # 1 [31 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 9 2 9 0 # 0 [30 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 10 2 10 9   # 9 [39 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 0 0,255,0,255,0,0,0,0,0,0 Common 11 10 0 <   # < [3c ]

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 12 2 12 6   # 6 [36 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 13 2 13 7   # 7 [37 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 14 2 14 4   # 4 [34 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 15 0 15 F    # F [46 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 16 2 16 2   # 2 [32 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 17 2 17 3   # 3 [33 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 18 2 18 5   # 5 [35 ]0

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 19 0 19 M    # M [4d ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 20 0 20 S    # S [53 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 21 0 21 E    # E [45 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 22 0 22 R    # R [52 ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 23 0 23 O    # O [4f ]A

Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 5 0,255,0,255,0,0,0,0,0,0 Latin 24 0 24 L    # L [4c ]A

Bad format in tr file, reading box coords
Reading eng.ocrb.exp0.tr ...
Flat shape table summary: Number of shapes = 22 max unichars = 1 number with multiple unichars = 0
Warning: no protos/configs for Joined in CreateIntTemplates()
Warning: no protos/configs for |Broken|0|1 in CreateIntTemplates()

step 6

cntraining eng.ocrb.exp0.tr
Reading eng.ocrb.exp0.tr ...
Clustering ...

Writing normproto ...

step 7

 mv shapetable eng.shapetable
    mv inttemp eng.inttemp
    mv pffmtable eng.pffmtable
    mv normproto eng.normproto

step 8

combine_tessdata eng.
Combining tessdata files
Error: traineddata file must contain at least (a unicharset fileand inttemp) OR an lstm file.
Error combining tessdata files into eng.traineddata
Version string:4.1.1
3:inttemp:size=163171, offset=192
4:pffmtable:size=194, offset=163363
5:normproto:size=2822, offset=163557
13:shapetable:size=400, offset=166379
23:version:size=5, offset=166779

here are the files details of my current directory

 ls -l
total 296
-rw-rw-r-- 1 sara sara 163171 فروری  23 17:58 eng.inttemp
-rw-rw-r-- 1 sara sara   2822 فروری  23 18:13 eng.normproto
-rw-rw-r-- 1 sara sara   1575 فروری  23 16:55 eng.ocrb.exp0.box
-rw-rw-r-- 1 sara sara  13806 جنوری  27 18:03 eng.ocrb.exp0.jpeg
-rw-rw-r-- 1 sara sara  96509 فروری  23 17:55 eng.ocrb.exp0.tr
-rw-rw-r-- 1 sara sara    194 فروری  23 17:58 eng.pffmtable
-rw-rw-r-- 1 sara sara    400 فروری  23 17:58 eng.shapetable
-rw-rw-r-- 1 sara sara     21 فروری  23 17:57 font_properties
-rw-rw-r-- 1 sara sara   1381 فروری  23 17:57 unicharset

followed these tutorials as well

https://www.youtube.com/watch?v=1v8BPw0Dn0I https://pretius.com/blog/ocr-tesseract-training-data/

i am using qt-box-editor and ubuntu 20.04 lts the corrected box coordinates in qt box editor

the orignal image

0

There are 0 answers