I am using ABBYY FineReader 12 with the Java API for an OCR task focusing on recognizing IDs in several well known patterns (regular expressions).
I am having trouble with the recognition of similar looking characters, for example: sometimes g
's get mistaken for 9
's, 0
's for O
's 1
's for I
's etc.
I think that using regular expression patterns might help overcome this. I was wondering how to incorporate a patterns file into the OCR process to improve the accuracy.
Should I do it with a user patterns file
IRecognizerParams recognizerParams = engine.CreateDocumentProcessingParams()
.getPageProcessingParams()
.getRecognizerParams();
recognizerParams.setUserPatternsFile("patterns.txt1");
recognizerParams.setTrainUserPatterns(true);
or by adding a regexp type dictionary?
IBaseLanguage lang = engine.CreateLanguageDatabase()
.CreateTextLanguage()
.getBaseLanguages()
.AddNew();
lang.setIsNaturalLanguage(false);
String pattern = "[A-Z0-9]{8}\\d");
lang.getDictionaryDescriptions()
.AddNew(DictionaryTypeEnum.DT_RegularExpression)
.GetAsRegExpDictionaryDescription()
.SetText(pattern);