Community

Recognizing the first X pages of a document Answered

Written by Permanently deleted user

December 05, 2018 23:54
4

Dear Abbyy,

While using the ProcessPages function of the FREngine API, I have noticed that even though the pageIndices parameter was specified (by adding the number of elements to the appropriate collection), when exporting the document, I was informed that document synthesis needs to be performed on the entire document first. This is also shown in this thread.

According to the same thread, I can use the SourceContentReuseMode property in my extractionParameters object. If I am not mistaken, however, this would also include the visible text layer in my export.

Thus, I have two questions:

1. Is there a way to achieve proper page recognition? Even a few lines of code or a short explanation would be incredibly helpful.

2. If I am right, and we do need to process the document twice, how would that affect the number of pages recognized, as we are planning to switch to a CloudSDK license and the recognized pages matter?

Thank you very much in advance.

Was this article helpful?

0 out of 0 found this helpful

Comments

4 comments

Permanently deleted user

December 07, 2018 08:52
The answer to your questions depends on your recognition scenario. Generally, if you would like to process only certain pages of your document, like 2nd and 3rd then please use

FREngine.FRDocument document = engineLoader.Engine.CreateFRDocument();
// Add image file to document
displayMessage( "Loading image..." );
FREngine.IIntsCollection pageIndicies = engineLoader.Engine.CreateIntsCollection();
pageIndicies.Add(2);
pageIndicies.Add(3);
document.AddImageFile( imagePath, null, pageIndicies );

// Recognize document
displayMessage( "Recognizing..." );
document.Process( null );

With this method, only 2 license counter units would be utilized.

If you have more specific processing scenario, then could you please describe it?

0
Permanently deleted user

December 07, 2018 13:48
Thank you very much for your answer.

The reason why I would like to specify the pages that need processing in the Process method is because I would like to count the number of pages first, and only recognize a couple from the beginning if the page count reaches a certain threshold.

For this, as far as I know, I need to call document.AddImageFile(file, null, null) and specify the page indices in the process method.

If I do, however, according to the message I get and the thread I linked to, I need to synthesize all pages, as well as process the ones I need to. I would like to include only the pages I process in my export, but my concern is that the visible text layer, e.g. existing text in a pdf document or in an image, might be also included. Hence my two questions.

0

Permanently deleted user

December 10, 2018 08:58

You may simply delete the unwanted pages using Pages.DeleteAt(...) after document loading. If you don't use these pages in the subsequent processing, then these pages won't decrease your license counter units.

                document.AddImageFile( imagePath, null, null );
                MessageBox.Show("Document contains " + document.Pages.Count + " pages!");
                document.Pages.DeleteAt(1);

Permanently deleted user

December 10, 2018 12:52
Thank you very much, this solves my question.

0

Please sign in to leave a comment.

Community

Recognizing the first X pages of a document Answered

Was this article helpful?

Comments

Didn't find what you were looking for?