Extracting pictures from the input document using FineReader Engine

Question

Is it possible to extract only pictures from the document?

Answer

It is possible to save every picture from a document as a separate file. The more straightforward approach is to simply export the document to HTML format. The pictures would be written as "export-name-1.jpg", "export-name-2.jpg", ... "export-name-n.jpg", where export-name is the HTML export file name, 1, 2, ... n is the picture number.

The resulting HTML file "export-name.html" can be simply removed.

 

To speed up the processing

If the recognition (OCR) is not required (picture extraction scenario), then please note, that the Document.Process() call is the analog of 

document.Preprocess()
document.Analyze();
document.Recognize();
document.Synthesize();

If the recognition (OCR) is not required, it is possible to replace the FRDocument.Process() method call with the following methods:

document.Preprocess();
document.Analyze();
document.Synthesize();

This will save time for recognition, which is one of the most time-consuming steps.

Please note, that the document analysis stage cannot be omitted, because, in this step, the Engine determines, where the pictures are located in the document.

To adjust picture format/filesize:

Please use HTMLExportParams.PictureExportParams Object.

Have more questions? Submit a request

Comments

1 comment

  • Avatar

    S Lieberam

    I am using the export to HTML function in order to extract pictures in FineReader for Windows. Most of the pictures are exported in .jpg format, which is okay. Some pictures are exported in .png format in a very poor quality. How can I force the export to be in .jpg?

    0

Please sign in to leave a comment.