Extracting pictures from the input document using FineReader Engine

Nikolai Kromm

Edited November 20, 2023 13:04

Question

Is it possible to extract only pictures from the document?

Answer

It is possible to save every picture from a document as a separate file. The more straightforward approach is to simply export the document to HTML format. The pictures would be written as "export-name-1.jpg", "export-name-2.jpg", ... "export-name-n.jpg", where export-name is the HTML export file name, 1, 2, ... n is the picture number.

The resulting HTML file "export-name.html" can be simply removed.

To speed up the processing

If the recognition (OCR) is not required (picture extraction scenario), then please note, that the Document.Process() call is the analog of

document.Preprocess()
document.Analyze();
document.Recognize();
document.Synthesize();

If the recognition (OCR) is not required, it is possible to replace the FRDocument.Process() method call with the following methods:

document.Preprocess();
document.Analyze();
document.Synthesize();

This will save time for recognition, which is one of the most time-consuming steps.

Please note, that the document analysis stage cannot be omitted, because, in this step, the Engine determines, where the pictures are located in the document.

To adjust picture format/filesize:

Please use HTMLExportParams.PictureExportParams Object.

Learn more ic-arrow-right