Hello,
I just noticed something strange.
If I do this:
IDocumentProcessingParams docProcessingParams = engine.CreateDocumentProcessingParams(); docProcessingParams.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);
engine.LoadPredefinedProfile("TextExtraction_Accuracy");
document.Process(docProcessingParams);
it gives a totally different result than if I do this:
engine.LoadPredefinedProfile("TextExtraction_Accuracy");
IDocumentProcessingParams docProcessingParams = engine.CreateDocumentProcessingParams(); docProcessingParams.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);
document.Process(docProcessingParams);
The first example does preserve the layout of the document. The second example does not preserve the layout of the document.
This is quite noticeable when there are tables.
Is it a bug or the order is important ?
Comments
3 comments
Yes, the order is important. As it is said in Developer's Help->Specifications->Predefined Profiles Specification : "All objects created after the profile is loaded will have these properties set to the specified values".
So, when you load TextExtraction profile before creating IDocumentProcessingParams, it is correct. All the settings from this profile will be used for processing, and as a result, the layout will not be preserved.
Ok I see ;)
Still, is it possible to get the benefits of the TextExtraction_Accuracy profile AND preserve layout ?
Well, the TextExtraction_Accuracy profile contains some settings, such as EnableTextExtractionMode=true, which significally improve the text recognition quality. But it affects the layout preservation.
You could investigate the settings from TextExtraction_Accuracy profile and choose which ones have positive influence on recognition quality. All the profile's settings are listed in the above-mentioned article.
In addition, I could recommend to take a look at "Improving Recognition Quality" article in Developer' Help. Hope, it will be also useful.
Please sign in to leave a comment.