Hi
When trying to process a specific PDF with the ABBYY FREngine Java-API we face the following error-message:
com.abbyy.FREngine.EngineException: The PDF file `invoice.pdf` has unsupported format and cannot be opened.
at com.abbyy.FREngine.IFRDocument.AddImageFile(Native Method)
Details about our installation of ABBYY FineReader Engine:
- Debian 8.11 (64-bit)
- Java 1.8.0_201 (64-bit)
- FineReader Engine 11.1.14.707470
Java-snippet that we use to proces the PDFs:
// Create document
IFRDocument document = engine.CreateFRDocument();
/*
If orientation detection is performed during document processing
(IPagePreprocessingParams::CorrectOrientation property is TRUE), you can select fast
orientation detection mode: set the OrientationDetectionMode property of the
OrientationDetectionParams object to ODM_Fast.
*/
IDocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();
dpp.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);
try {
// Add image file to document
document.AddImageFile( imagePath, null, null );
//process full document
document.Process(dpp);
// Save results to pdf using 'balanced' scenario
IPDFExportParams pdfParams = engine.CreatePDFExportParams();
pdfParams.setScenario( PDFExportScenarioEnum.PES_Balanced );
String pdfExportPath = inputfilename + "_ocrred.pdf";
document.Export( pdfExportPath, FileExportFormatEnum.FEF_PDF, pdfParams );
} finally {
// Close document
document.Close();
}
Other PDFs are succesfully processed, this specific one is not.
Any suggestions?
Best regards
Koen de Leijer
Comments
1 comment
Hi
We've found out that the PDFs that are rejected by ABBYY Finereader have one thing in common,
they all have "PDF Producer" => "Adobe XML Form Library".
According to the Adobe forum, these PDF are XMLs wrapped inside a PDF:https://forums.adobe.com/thread/391837
The need to OCR these PDFs that sometimes, valuable information is within the company-logo or an image in the PDF-footer.
Thanks in advance
Koen de Leijer
Please sign in to leave a comment.