コミュニティ

ObjectsExtractionParams::SourceContentReuseMode. By default it is set to CRM_Auto

Hello, 

When ObjectsExtractionParams::SourceContentReuseMode is set to CRM_Auto, under which circumstances does it decide to reuse context vs performing OCR. I notice that even when the source PDF contains text content, seems to always OCR. 

Thank you,

Juan

この記事は役に立ちましたか?

0人中0人がこの記事が役に立ったと言っています

コメント

1件のコメント

  • Avatar
    Permanently deleted user

    Hi Juan,

    In general, the algorithm for each text block starts to OCR and trying to check whether the pdf text layer is reliable. After recognizing enough information in case that symbols in the text layer are similar to the recognized symbols, the rest of text block is taken from the text layer. Otherwise, it continues to OCR further.

    However, some fonts are difficult to understand even for a human eye, so the Engine may choose the second option.

    If you're sure that the text layer of your document is correct you may immediately select the CRM_ContentOnly option.

    0

サインインしてコメントを残してください。