How to extract text layer from PDF without recognition?


If your document flow contains PDFs with a text layer, then perhaps, you would like to extract it without OCR in order to speed up the process. To do this, set the SourceContentReuseMode property of the ObjectsExtractionParams object to CRM_ContentOnly.

The sample code in C#:

// Let's say the Engine and FRDocument are already initialized above.
// Use IsPdfWithTextualContent method of the Engine object
// to check document for text layer.
FREngine.DocumentProcessingParams docProcParams;
docProcParams = engineLoader.Engine.CreateDocumentProcessingParams();

FREngine.ObjectsExtractionParams objExtractionParams;
objExtractionParams = docProcParams.PageProcessingParams.ObjectsExtractionParams;
objExtractionParams.SourceContentReuseMode = FREngine.SourceContentReuseModeEnum.CRM_ContentOnly;

// Recognize document


