Regular Expressions

There are several things that are strongly recommended on using regular expressions:

  • specify the letter set,
  • specify that the language you use is not natural,
  • specify that words from dictionary only should be used.

All these things are implemented in the following sample code:

string fileName = Path.GetFileName(filePath);
string outputPath = Path.Combine(_outputDir, fileName);
var frDoc = _engine.CreateFRDocumentFromImage(filePath, null);
var rp = _engine.CreateRecognizerParams();

//set RegExp------------------------------------------------
FREngine.LanguageDatabase languageDatabase = _engine.CreateLanguageDatabase();
FREngine.TextLanguage textLang = languageDatabase.CreateTextLanguage();
FREngine.BaseLanguage baseLang = textLang.BaseLanguages.AddNew();
baseLang.set_LetterSet(FREngine.BaseLanguageLetterSetEnum.BLLS_Alphabet, "$0123456789,.");
baseLang.IsNaturalLanguage = false;
baseLang.AllowWordsFromDictionaryOnly = true;
var dictDescr = baseLang.DictionaryDescriptions.AddNew(FREngine.DictionaryTypeEnum.DT_RegularExpression);
dictDescr.GetAsRegExpDictionaryDescription().SetText(@"[$0-9,.]+");
rp.TextLanguage = textLang;

//------------------------------------------------
var region = _engine.CreateRegion();
region.AddRect(375, 21, 465, 31);
frDoc.Pages[0].Layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region, 0);
frDoc.Pages[0].Layout.Blocks[0].GetAsTextBlock().RecognizerParams = rp;
frDoc.Recognize(null, null);
frDoc.Export(outputPath + ".txt", FREngine.FileExportFormatEnum.FEF_TextUnicodeDefaults, null);

The ABBYY FineReader Engine regular expression alphabet can be found in the article Help → Index → Regular expressions.

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.