Question
How to extract text layer from PDF without recognition?
Answer
If your document flow contains PDFs with a text layer, then perhaps, you would like to extract it without OCR in order to speed up the process. To do this, set the SourceContentReuseMode property of the ObjectsExtractionParams object to CRM_ContentOnly.
The sample code in C#:
// Let's say the Engine and FRDocument are already initialized above.
// Use IsPdfWithTextualContent method of the Engine object
// to check document for text layer.
FREngine.DocumentProcessingParams docProcParams;
docProcParams = engineLoader.Engine.CreateDocumentProcessingParams();
FREngine.ObjectsExtractionParams objExtractionParams;
objExtractionParams = docProcParams.PageProcessingParams.ObjectsExtractionParams;
objExtractionParams.SourceContentReuseMode = FREngine.SourceContentReuseModeEnum.CRM_ContentOnly;
// Recognize document
document.Process(docProcParams);
Comments
0 comments
Please sign in to leave a comment.