How to extract text layer from PDF without recognition

If your document flow contains PDFs with a text layer, then perhaps, you would like to extract it without OCR in order to speed up the process. To do this, set the SourceContentReuseMode property of the ObjectsExtractionParams object to CRM_ContentOnly.

The sample code in C#:

// Let's say the Engine and FRDocument are already initialized above.
// Use IsPdfWithTextualContent method of the Engine object
// to check document for text layer.
FREngine.DocumentProcessingParams docProcParams;
docProcParams = engineLoader.Engine.CreateDocumentProcessingParams();

FREngine.ObjectsExtractionParams objExtractionParams;
objExtractionParams = docProcParams.PageProcessingParams.ObjectsExtractionParams;
objExtractionParams.SourceContentReuseMode = FREngine.SourceContentReuseModeEnum.CRM_ContentOnly;

// Recognize document
document.Process(docProcParams);

 

Was this article helpful?

0 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.

Recently viewed