How to extract text layer from PDF without recognition

Din Idrisov

Edited June 21, 2023 12:12

Question

How to extract text layer from PDF without recognition?

Answer

If your document flow contains PDFs with a text layer, then perhaps, you would like to extract it without OCR in order to speed up the process. To do this, set the SourceContentReuseMode property of the ObjectsExtractionParams object to CRM_ContentOnly.

The sample code in C#:

// Let's say the Engine and FRDocument are already initialized above.
// Use IsPdfWithTextualContent method of the Engine object
// to check document for text layer.
FREngine.DocumentProcessingParams docProcParams;
docProcParams = engineLoader.Engine.CreateDocumentProcessingParams();

FREngine.ObjectsExtractionParams objExtractionParams;
objExtractionParams = docProcParams.PageProcessingParams.ObjectsExtractionParams;
objExtractionParams.SourceContentReuseMode = FREngine.SourceContentReuseModeEnum.CRM_ContentOnly;

// Recognize document
document.Process(docProcParams);

How to extract text layer from PDF without recognition

Din Idrisov

Question

Answer

Was this article helpful?

Recently viewed

How to extract text layer from PDF without recognition

Din Idrisov

Question

Answer

Was this article helpful?

Related articles

Recently viewed