Community

ObjectsExtractionParams::SourceContentReuseMode. By default it is set to CRM_Auto

Hello, 

When ObjectsExtractionParams::SourceContentReuseMode is set to CRM_Auto, under which circumstances does it decide to reuse context vs performing OCR. I notice that even when the source PDF contains text content, seems to always OCR. 

Thank you,

Juan

Was this article helpful?

0 out of 0 found this helpful

Comments

1 comment

  • Avatar
    Permanently deleted user

    Hi Juan,

    In general, the algorithm for each text block starts to OCR and trying to check whether the pdf text layer is reliable. After recognizing enough information in case that symbols in the text layer are similar to the recognized symbols, the rest of text block is taken from the text layer. Otherwise, it continues to OCR further.

    However, some fonts are difficult to understand even for a human eye, so the Engine may choose the second option.

    If you're sure that the text layer of your document is correct you may immediately select the CRM_ContentOnly option.

    0

Please sign in to leave a comment.