コミュニティ

ObjectsExtractionParams::SourceContentReuseMode. By default it is set to CRM_Auto

Written by Permanently deleted user

2018年02月17日 18:02
1

Hello,

When ObjectsExtractionParams::SourceContentReuseMode is set to CRM_Auto, under which circumstances does it decide to reuse context vs performing OCR. I notice that even when the source PDF contains text content, seems to always OCR.

Thank you,

Juan

この記事は役に立ちましたか？

0人中0人がこの記事が役に立ったと言っています

1件のコメント

Permanently deleted user

2018年02月22日 08:40
Hi Juan,

In general, the algorithm for each text block starts to OCR and trying to check whether the pdf text layer is reliable. After recognizing enough information in case that symbols in the text layer are similar to the recognized symbols, the rest of text block is taken from the text layer. Otherwise, it continues to OCR further.

However, some fonts are difficult to understand even for a human eye, so the Engine may choose the second option.

If you're sure that the text layer of your document is correct you may immediately select the CRM_ContentOnly option.

0

サインインしてコメントを残してください。

コミュニティ

ObjectsExtractionParams::SourceContentReuseMode. By default it is set to CRM_Auto

この記事は役に立ちましたか？

コメント

お探しのものを見つけられませんでしたか？