Regular Expressions in FineReader Engine

Question

How to properly use Regular Expressions in FineReader Engine?

Answer

There are several things that are strongly recommended for using regular expressions:

  • specify the letter set
  • specify that the language used is not natural
  • specify that words from the dictionary only should be used

The above recommendations are implemented in the following sample code:

string fileName = Path.GetFileName(filePath);
string outputPath = Path.Combine(_outputDir, fileName);
var frDoc = _engine.CreateFRDocumentFromImage(filePath, null);
var rp = _engine.CreateRecognizerParams();

//set RegExp------------------------------------------------
FREngine.LanguageDatabase languageDatabase = _engine.CreateLanguageDatabase();
FREngine.TextLanguage textLang = languageDatabase.CreateTextLanguage();
FREngine.BaseLanguage baseLang = textLang.BaseLanguages.AddNew();
baseLang.set_LetterSet(FREngine.BaseLanguageLetterSetEnum.BLLS_Alphabet, "$0123456789,.");
baseLang.IsNaturalLanguage = false;
baseLang.AllowWordsFromDictionaryOnly = true;
var dictDescr = baseLang.DictionaryDescriptions.AddNew(FREngine.DictionaryTypeEnum.DT_RegularExpression);
dictDescr.GetAsRegExpDictionaryDescription().SetText(@"[$0-9,.]+");
rp.TextLanguage = textLang;

//------------------------------------------------
var region = _engine.CreateRegion();
region.AddRect(375, 21, 465, 31);
frDoc.Pages[0].Layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region, 0);
frDoc.Pages[0].Layout.Blocks[0].GetAsTextBlock().RecognizerParams = rp;
frDoc.Recognize(null, null);
frDoc.Export(outputPath + ".txt", FREngine.FileExportFormatEnum.FEF_TextUnicodeDefaults, null);

The FineReader Engine regular expression alphabet can be found in the Working with Regular Expressions article.

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.