How to get recognized text from a document in FlexiCapture SDK?

Question

I want to access all background text on recognized pages, for example, to index documents for future searches. How can I do that?

Answer

To do this, you can use the ExtractTextRegions method of the Page object. For example:

// C#
batch.Recognize(null, RecognitionModeEnum.RM_ReApplyDocumentDefinitions, null);
foreach (IDocument document in batch.Documents) {
document.Open();
foreach (IPage page in document.Pages) {
ITextRegions textRegions = page.ExtractTextRegions();
foreach (ITextRegion textRegion in textRegions) {
string plainText = textRegion.Text.PlainText;
System.Console.WriteLine(plainText);
}
}
document.Close();
}

You can find additional information about the ExtractTextRegions in the FlexiCapture SDK User's Guide.

 

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.