I need to define a region with regex and another region with a custom dictionary in an image. For regex region, I tried to implement the logic based on the section `How to attach a dictionary to a recognition language` in the user guide as follows but it does not affect the result at all. May I know if the following code snippet is correct?
IRecognizerParams recognizerParams = engine.CreateRecognizerParams();
ILanguageDatabase languageDatabase = engine.CreateLanguageDatabase();
ITextLanguage textLanguage = languageDatabase.CreateTextLanguage();
IBaseLanguages baseLanguages = textLanguage.getBaseLanguages();
IBaseLanguage baseLanguage = baseLanguages.AddNew();
IDictionaryDescriptions dictionaryDescriptions = baseLanguage.getDictionaryDescriptions();
IDictionaryDescription dictionaryDescription = dictionaryDescriptions.AddNew(DictionaryTypeEnum.DT_RegularExpression);
IRegExpDictionaryDescription regExpDictionaryDescription = dictionaryDescription.GetAsRegExpDictionaryDescription();
regExpDictionaryDescription.SetText("(((|0)[1-9])|([12][0-9])|(30)|(31))\\-(((|0)[1-9])|(10)|(11)|(12))\\-((((19)|(20))[0-9][0-9])|([0-9][0-9]))");
baseLanguage.setAllowWordsFromDictionaryOnly(true);
// baseLanguage.setLetterSet(type, result); // no idea what the result parameter should be
recognizerParams.setTextLanguage(textLanguage);
region.AddRect(0, 100, 500, 125);
region.AddRect(0, 200, 500, 225);
document.getPages().getElement(0).getLayout().getBlocks().AddNew(BlockTypeEnum.BT_Text, region, 0);
document.Recognize( null, null );
For custom dictionary, we have a word list in tesseract's .user-words format (one word per line). What is the proper way to consume the .user-words file?
Thanks very much.
Related topic: https://forum.ocrsdk.com/thread/how-to-only-recognize-specified-region-of-the-image-in-java/
コメント
2件のコメント
From the interface file IFRDocument, it seems that only document.Analyze would accept recognizerParams as input so I added
before the document.Recognize( null, null ); statement.
During execution, the following error occurred:
Having search 'alphabet' in the user manual and interface files, I am unable to find any clue to resolve this.
May I know if I'm on the right track?
Thanks very much.
Hi!
Firstly, if you add regions manually using AddNew() method you have to specify recognition parameters for each of them manually too:
Secondly, when creating a new BaseLanguage object, it is necessary not only to create and set dictionaries, but also set an alphabet via setLetterSet method:
Hope it helps!
サインインしてコメントを残してください。