Community

Recognizing the first X pages of a document Answered

Dear Abbyy,

 

While using the ProcessPages function of the FREngine API, I have noticed that even though the pageIndices parameter was specified (by adding the number of elements to the appropriate collection), when exporting the document, I was informed that document synthesis needs to be performed on the entire document first. This is also shown in this thread.

 

According to the same thread, I can use the SourceContentReuseMode property in my extractionParameters object. If I am not mistaken, however, this would also include the visible text layer in my export.

 

Thus, I have two questions:

 

1. Is there a way to achieve proper page recognition? Even a few lines of code or a short explanation would be incredibly helpful.

 

2. If I am right, and we do need to process the document twice, how would that affect the number of pages recognized, as we are planning to switch to a CloudSDK license and the recognized pages matter?

 

Thank you very much in advance.

 

 

Was this article helpful?

0 out of 0 found this helpful

Comments

4 comments

  • Avatar
    Nadezhda A. Solovyeva

    The answer to your questions depends on your recognition scenario. Generally, if you would like to process only certain pages of your document, like 2nd and 3rd then please use

                FREngine.FRDocument document = engineLoader.Engine.CreateFRDocument();
                   // Add image file to document
                    displayMessage( "Loading image..." );
                    FREngine.IIntsCollection pageIndicies = engineLoader.Engine.CreateIntsCollection();
                    pageIndicies.Add(2);
                    pageIndicies.Add(3);
                    document.AddImageFile( imagePath, null, pageIndicies );

                    // Recognize document
                    displayMessage( "Recognizing..." );
                    document.Process( null );

    With this method, only 2 license counter units would be utilized.

    If you have more specific processing scenario, then could you please describe it? 

    0
  • Avatar
    robjoy88

    Thank you very much for your answer.

     

    The reason why I would like to specify the pages that need processing in the Process method is because I would like to count the number of pages first, and only recognize a couple from the beginning if the page count reaches a certain threshold.

     

    For this, as far as I know, I need to call document.AddImageFile(file, null, null) and specify the page indices in the process method.

     

    If I do, however, according to the message I get and the thread I linked to, I need to synthesize all pages, as well as process the ones I need to. I would like to include only the pages I process in my export, but my concern is that the visible text layer, e.g. existing text in a pdf document or in an image, might be also included. Hence my two questions.

     

    0
  • Avatar
    Nadezhda A. Solovyeva

    You may simply delete the unwanted pages using Pages.DeleteAt(...) after document loading. If you don't use these pages in the subsequent processing, then these pages won't decrease your license counter units.

                    document.AddImageFile( imagePath, null, null );
                    MessageBox.Show("Document contains " + document.Pages.Count + " pages!");                 document.Pages.DeleteAt(1);
    0
  • Avatar
    robjoy88

    Thank you very much, this solves my question.

    0

Please sign in to leave a comment.