Hi,
I want to process only few pages of a large pdf. I can't get IFRDocument.ProcessPages to work because I'm not sure what to do with / how to set IIntsCollection.
For now I have the following snippet to OCR only the first and last page:
// Create document
IFRDocument document = engine.CreateFRDocument();
// Add image file to document
document.AddImageFile( imagePath, null, null );
// Get page-count
int pagesCount = document.getPages().getCount();
if (pagesCount > 2) {
//only first and last page
IIntsCollection indices=engine.CreateIntsCollection();
indices.Add(0);
indices.Add(pagesCount-1);
document.ProcessPages(indices, null);
} else {
//process full document
document.Process( null );
}
But that gives me an error:
Document synthesis has not been performed for the page with index 1
Regards
Comments
9 comments
Hi Koen,
Sorry for long silence. I've converted your question to the separate post, as you ask about our offline FREngine product, not Cloud OCR SDK.
The issue occurs because if you want to get a multipage output, you need to perform document synthesis of all pages before export (the Process… method includes document synthesis). Thus, you need to process all pages. For that you can OCR only the first and last pages of your document, and the other pages should be processed using a visible text layer of the source PDF file. Please use the SourceContentReuseMode property of the ObjectsExtractionParams object for this. Below there is a code snippet in C# (sorry that it is not in Java, but the idea is clear), how to implement this scenario:
Hope this will be useful!
How to use GetPagesToProcess Function of IFileAdapter in Hello C# Code of Finereader Engine 12 and can you explain why I have to do document synthesis in the above code
Hi!
As we have already answered you in the post you should see our standard BatchProcessing code sample in C#.
Document processing in ABBYY FineReader Engine consists of several steps: page preprocessing, analysis, recognition, page synthesis, document synthesis, and export. At the document synthesis stage the font styles and the logical structure of the document are recreated. This stage is required before the export stage. During export recognized documents are saved in files in suitable formats.
Hope this information will be usefull.
ho can we perform same using JAVA. I have multiple page pdf document on which i have to apply file reader to convert that into editable format?
ho can we perform same using JAVA. I have multiple page pdf document on which i have to apply file reader to convert that into editable format?
how can we perform same using JAVA. I have multiple page pdf document on which i have to apply file reader to convert that into editable format?
Hi we are trying page range limitation with below code.We will be giving start range and end range of the page to digitize but we are facing problem for some pdf if we are giving 1-2 as page range it is digitizing the whole document or for some pdf if we are giving 2-3 it is digitizing from 1 to 3 page but it should do from 2 to 3.I dont know what is going wrong please review below code for your reference.
Hi,
What is the use of SourceContentReuseMode?
Hi Rama
SourceContentReuseMode is available in the documentation of ABBYY FineReader:
https://knowledgebase.abbyy.com/article/1581
And:
Best regards
Koen de Leijer
Please sign in to leave a comment.