Community

ObjectsExtractionParams::SourceContentReuseMode. By default it is set to CRM_Auto

Written by Permanently deleted user

February 17, 2018 18:02
1

Hello,

When ObjectsExtractionParams::SourceContentReuseMode is set to CRM_Auto, under which circumstances does it decide to reuse context vs performing OCR. I notice that even when the source PDF contains text content, seems to always OCR.

Thank you,

Juan

Was this article helpful?

0 out of 0 found this helpful

Comments

1 comment

Permanently deleted user

February 22, 2018 08:40
Hi Juan,

In general, the algorithm for each text block starts to OCR and trying to check whether the pdf text layer is reliable. After recognizing enough information in case that symbols in the text layer are similar to the recognized symbols, the rest of text block is taken from the text layer. Otherwise, it continues to OCR further.

However, some fonts are difficult to understand even for a human eye, so the Engine may choose the second option.

If you're sure that the text layer of your document is correct you may immediately select the CRM_ContentOnly option.

0

Please sign in to leave a comment.

Community

ObjectsExtractionParams::SourceContentReuseMode. By default it is set to CRM_Auto

Was this article helpful?

Comments

Didn't find what you were looking for?