Hi
When trying to process a specific PDF with the ABBYY FREngine Java-API we face the following error-message:
com.abbyy.FREngine.EngineException: The PDF file `invoice.pdf` has unsupported format and cannot be opened.
at com.abbyy.FREngine.IFRDocument.AddImageFile(Native Method)
Details about our installation of ABBYY FineReader Engine:
- Debian 8.11 (64-bit)
- Java 1.8.0_201 (64-bit)
- FineReader Engine 11.1.14.707470
Java-snippet that we use to proces the PDFs:
// Create document
IFRDocument document = engine.CreateFRDocument();
/*
If orientation detection is performed during document processing
(IPagePreprocessingParams::CorrectOrientation property is TRUE), you can select fast
orientation detection mode: set the OrientationDetectionMode property of the
OrientationDetectionParams object to ODM_Fast.
*/
IDocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();
dpp.getPageProcessingParams().getPagePreprocessingParams().setCorrectOrientation(true);
try {
// Add image file to document
document.AddImageFile( imagePath, null, null );
//process full document
document.Process(dpp);
// Save results to pdf using 'balanced' scenario
IPDFExportParams pdfParams = engine.CreatePDFExportParams();
pdfParams.setScenario( PDFExportScenarioEnum.PES_Balanced );
String pdfExportPath = inputfilename + "_ocrred.pdf";
document.Export( pdfExportPath, FileExportFormatEnum.FEF_PDF, pdfParams );
} finally {
// Close document
document.Close();
}
Other PDFs are succesfully processed, this specific one is not.
Any suggestions?
Best regards
Koen de Leijer
コメント
1件のコメント
Hi
We've found out that the PDFs that are rejected by ABBYY Finereader have one thing in common,
they all have "PDF Producer" => "Adobe XML Form Library".
According to the Adobe forum, these PDF are XMLs wrapped inside a PDF:https://forums.adobe.com/thread/391837
The need to OCR these PDFs that sometimes, valuable information is within the company-logo or an image in the PDF-footer.
Thanks in advance
Koen de Leijer
サインインしてコメントを残してください。