How to use regular expressions to improve OCR using ABBYY FineReader

868 views Asked by At

I am using ABBYY FineReader 12 with the Java API for an OCR task focusing on recognizing IDs in several well known patterns (regular expressions).

I am having trouble with the recognition of similar looking characters, for example: sometimes g's get mistaken for 9's, 0's for O's 1's for I's etc.

I think that using regular expression patterns might help overcome this. I was wondering how to incorporate a patterns file into the OCR process to improve the accuracy.

Should I do it with a user patterns file

IRecognizerParams recognizerParams = engine.CreateDocumentProcessingParams()
                                           .getPageProcessingParams()
                                           .getRecognizerParams();
recognizerParams.setUserPatternsFile("patterns.txt1");
recognizerParams.setTrainUserPatterns(true);

or by adding a regexp type dictionary?

IBaseLanguage lang = engine.CreateLanguageDatabase()
                           .CreateTextLanguage()
                           .getBaseLanguages()
                           .AddNew();
lang.setIsNaturalLanguage(false);
String pattern = "[A-Z0-9]{8}\\d"); 
lang.getDictionaryDescriptions()
    .AddNew(DictionaryTypeEnum.DT_RegularExpression)
    .GetAsRegExpDictionaryDescription()
    .SetText(pattern);
0

There are 0 answers