How to recognize text containing only numbers

Question

How to increase the accuracy of the text, that contains only numbers?

Answer

There are two ways to recognize text containing only numbers:

Use the Digits Predefined Text Languages

Sample code in C#:

// load image
FRDocument document = engineLoader.Engine.CreateFRDocumentFromImage(@"D:\Demo.tif");

// parameters setting
DocumentProcessingParams documentProcessingParams = engineLoader.Engine.CreateDocumentProcessingParams();
documentProcessingParams.PageProcessingParams.RecognizerParams.SetPredefinedTextLanguage("Digits");

// process the document using the predifined language

document.Process( documentProcessingParams );

Note:  "Digits" text language contains other common symbols, typical for digit-only fields, such as decimal separator sign (comma or dot), minus and etc. Here is the full list:

#$%()+,-./:=[]{}¢£°¼½¾—‹›€

Specify the alphabet directly

  1. Сreate BaseLanguage object and set its alphabet
  2. Сreate corresponding TextLanguage object
  3. Process the document using the created language

Sample code in C#:

// creating text language
ILanguageDatabase languageDatabase = engineLoader.Engine.CreateLanguageDatabase();
ITextLanguage textLanguage = languageDatabase.CreateTextLanguage();

// creating and setting base language
IBaseLanguage baseLanguage = textLanguage.BaseLanguages.AddNew();
baseLanguage.LetterSet[BaseLanguageLetterSetEnum.BLLS_Alphabet] = "0123456789";

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.