How to extract text layer from PDF without recognition

Question

How to extract text layer from PDF without recognition?

Answer

If your document flow contains PDFs with a text layer, then perhaps, you would like to extract it without OCR in order to speed up the process. To do this, set the SourceContentReuseMode property of the ObjectsExtractionParams object to CRM_ContentOnly.

The sample code in C#:

// Let's say the Engine and FRDocument are already initialized above.
// Use IsPdfWithTextualContent method of the Engine object
// to check document for text layer.
FREngine.DocumentProcessingParams docProcParams;
docProcParams = engineLoader.Engine.CreateDocumentProcessingParams();

FREngine.ObjectsExtractionParams objExtractionParams;
objExtractionParams = docProcParams.PageProcessingParams.ObjectsExtractionParams;
objExtractionParams.SourceContentReuseMode = FREngine.SourceContentReuseModeEnum.CRM_ContentOnly;

// Recognize document
document.Process(docProcParams);

 

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.