I am a first-time Tesseract user trying to train the Tesseract OCR model to recognize handwritten text. I am using the NIST Database 19 dataset, which has over 800,000 images of handwritten data in English. However, the dataset does not have bounding boxes for the characters or lines, and it is not feasible to annotate them manually.
Is there a way to automate the annotation process?
Thank you for your time.
P.S. If there exists a better database for this or if someone has already made a .traineddata file please let me know. I will be open to changing the database if it means getting better results.