Question
How to properly use Regular Expressions in FineReader Engine?
Answer
There are several things that are strongly recommended for using regular expressions:
- specify the letter set
- specify that the language used is not natural
- specify that words from the dictionary only should be used
The above recommendations are implemented in the following sample code:
string fileName = Path.GetFileName(filePath);
string outputPath = Path.Combine(_outputDir, fileName);
var frDoc = _engine.CreateFRDocumentFromImage(filePath, null);
var rp = _engine.CreateRecognizerParams();
//set RegExp------------------------------------------------
FREngine.LanguageDatabase languageDatabase = _engine.CreateLanguageDatabase();
FREngine.TextLanguage textLang = languageDatabase.CreateTextLanguage();
FREngine.BaseLanguage baseLang = textLang.BaseLanguages.AddNew();
baseLang.set_LetterSet(FREngine.BaseLanguageLetterSetEnum.BLLS_Alphabet, "$0123456789,.");
baseLang.IsNaturalLanguage = false;
baseLang.AllowWordsFromDictionaryOnly = true;
var dictDescr = baseLang.DictionaryDescriptions.AddNew(FREngine.DictionaryTypeEnum.DT_RegularExpression);
dictDescr.GetAsRegExpDictionaryDescription().SetText(@"[$0-9,.]+");
rp.TextLanguage = textLang;
//------------------------------------------------
var region = _engine.CreateRegion();
region.AddRect(375, 21, 465, 31);
frDoc.Pages[0].Layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region, 0);
frDoc.Pages[0].Layout.Blocks[0].GetAsTextBlock().RecognizerParams = rp;
frDoc.Recognize(null, null);
frDoc.Export(outputPath + ".txt", FREngine.FileExportFormatEnum.FEF_TextUnicodeDefaults, null);
The FineReader Engine regular expression alphabet can be found in the Working with Regular Expressions article.
Comments
0 comments
Please sign in to leave a comment.