ABBYY FineReader Engine 11 for Linux - Release 3

Part# 1155/14
Build#  11.1.6.562411

New Features and Improvements

Possibility to extract and to add attachments from PDF

From this release it is possible to extract attachments from input PDF file and to add attachments to the output PDF file.

Each attachment is represented by PDFAttachment object that exposes methods which allow you to access the attached file by saving it on disk or into the global memory.

PDFAttachment object also provides access to the original file name, description added by the author and the type of binding of the attachment.  The binding can take values from PDFAttachmentBindingEnum: PAB_Annotation and PAB_Document.  PAB_Annotation means that attached file is associated with a specific annotation on a specific page. PAB_Document value is set if the file is attached to the whole document. Note that for an attachment which is added via FineReader Engine API binding value is always PAB_Document.

All the attachments are contained in FRDocument::PDFAttachments. They are extracted from the input PDF document during opening, or you can add your own files to be attached to the output PDF file during export. To attach all the files of this collection to the output PDF file, it is necessary set the IPDFExportFeatures::WriteSourceAttachments property to TRUE.

New property IPrepareImageMode::BackgroundFillingColor to fill areas added after skew correction

After skew correction supplementary areas are added to the edges of document. The color of these areas is set automatically by Fine Reader Engine:

        

Now it is possible to specify the color used for filling the areas manually with IPrepareImageMode::BackgroundFillingColor property:

        

The default value of this property is -1, which means that the color is determined by ABBYY FineReader Engine automatically.

(HelpDesk request 401344 (for Windows)).

Possibility to see PDF layers in PDF Viewers

A new property IPDFMRCParams::AssignPdfLayersToMrcPlanes was added in this release. This property manages the availability of PDF layers in the output PDF file. If you set it to TRUE, PDF layers are assigned to MRC image planes, so that in some PDF viewers such as Adobe Acrobat you can choose which layers to view:

This property is only taken into account for image-only and image-on-text PDF files.

The default value of this property is FALSE.

New method IImageDocument::RemoveColorObjectsEx

This feature can be useful for documents with black text and white background that contain colored elements that don’t need to be recognized. For example, for monochromic documents with colored stamps only black-and-white layer can be recognized and the colored elements can be added to output document.

IImageDocument::RemoveColorObjectsEx method allows you to remove color objects of specified hues from original document

The hues of removed objects are set by HSL representation. The method allows replacing removed objects with the specified color. It also allows saving a separate image containing only the extracted objects:

New advanced language detection mode

New property RecognizerParams::LanguageDetectionMode was added on this release. This property will be useful for recognition of documents the language of which is not known to you. If language detection mode is on, the recognition languages are selected from the list of languages specified in the TextLanguage property. Language detection was significantly improved from the previous release and now it turns on automatically if it is necessary.

This property has three modes: TSPV_Auto, TSPV_No and TSPV_Yes. The default value of this property is TSPV_Auto. In this mode ABBYY FineReader Engine will automatically determine if this processing mode should be used, depending on the situation.

Auto detection will be useful only if TextLanguage set contains a combination of CJK and European languages. New feature will increase the speed and improve the quality of recognition for documents with European languages.

In the scenario when you know that all languages, specified in TextLanguage set, are present in the document, that you process, we recommend setting RecognizerParams::LanguageDetectionMode to TSPV_No.

Old property RecognizerParams::DetectLanguage is marked as deprecated.

New predefined filter FNF_PDF in FontNamesFiltersEnum

FontNamesEnum enumeration constants describe predefined filters of font family names. These filters specify the set of fonts to be used during document synthesis. A new value FNF_PDF was added to FontNamesFiltersEnum.  If this filter is applied, document synthesis uses font families the names of which are specified in the resources of the input PDF file. However, the fonts themselves are not extracted from the PDF file; they need to be installed on the workstation to be used.

New property IFRDocument::PDFFontNames

New read-only property IFRDocument::PDFFontName returns the collection of fonts extracted from PDF file.

Skew correction in preprocess stage

Now it is possible correct skew at preprocessing stage instead of during image opening.

CorrectSkew property was added to PagePreprocessingParams object.  The type of skew correction is defined by the CorrectSkewMode property.

Advanced IsEmptyEx method for checking if the page is empty

IsEmptyEx is a new method which allows you to specify additional parameters during empty page detection.

With EmptyPageDetectionParams object you can define the number of letters and text objects that a page can contain and still be considered empty. It is possible to set maximum black percentage and to specify if the page must be searched for barcodes. You can also set the page rectangle, so that any garbage on the margins does not affect the result.

A method IEngine::CreateEmptyPageDetectionParams was included in API to create the EmptyPageDetectionParams object.

WibuKey support

WibuKey protection hardware keys are supported in this release. It is necessary to use new pricelists to generate licenses with license Storage CodeMeter Key.

Important! Please note that Wibu drivers are not included into the FRE 11 for Linux R3 distribution. Users should download and install them manually. To do it:

  1. Download the drivers on http://www.wibu.com/downloads-user-software.html by clicking  Download for CodeMeter Runtime for Linux user item.
  2. Install the Wibu drivers on the computer where the Licensing Service is installed.
  3. Connect the Wibu key to the USB port of the computer. To view license  properties, use the License Manager utility.

ABBYY has plans to include these drivers into R4 distribution. Documentation will be updated with detailed information on Wibu key usage either.

Possibility to export TIFF files with one strip

A strip is a subsection of the image composed of one or more rows. A TIFF image may be composed of one or more strips. Now it is possible to deliberately produce TIFF file composed of one strip by setting ITiffExtendedParams::WriteSingleStrip to TRUE. The parameters for TIFF files producing can be used IImage::WriteToFile method.

That feature was implemented as some archive writers don’t support valid TIFF files with several strips.

(HelpDesk request 421941 (for Windows))

New property Recognition Set

A property ITextLanguage::RecognitionSet returns the full letter set used for recognition with this TextLanguage, combining all letter sets of its base languages and additional letter sets.

(HelpDesk requests #375306, #420377, #300708 (for Windows))

Improved CRM_ContentOnly mode

New technology is used in CRM_Content Mode for document analysis of PDF files. In CRM_ContentOnly mode only content of the source PDF file is used (the image is not rasterized for recognition as it was before, all the information about text layer, images, separators, etc. is taken directly from the PDF). This mode is designed for PDF that contain not only raster elements. This changing helps to preserve the original layout of the source document and improve the quality of the output picture. This feature is particularly useful for conversion from PDF to formats of MS Office.

Comparing to the previous release CRM_Content Mode works faster and some shortcomings and bugs of this mode were eliminated.

To use the feature you can set IObjectsExtractionParams:: SourceContentReuseMode to CRM_ContentOnly and use these parameters during analysis stage.

Correct display of PDF files with PMingLIU/MingLIU fonts in PDF viewers

Now all exported PDF files with PMingLIU and MingLIU fonts are displayed correctly in all PDF viewers. Previously there were problems with display due to errors of some viewers. As a workaround these fonts are now embedded to the output PDF file.

Please note that if for a Chinese-language document the PMingLiU/MingLiU font is used, it will be embedded into output PDF file regardless of the value of the property IPDFExportFeatures::FontEmbeddingMode. (HelpDesk request #414470 (for Windows))

Information about adding comments in profiles in Help file

The information and an example on how to add comments, were in the section Working with Profiles Comments can be added by starting a line with a semicolon. (HelpDesk request #406671 (for Windows)).

Performance results

This section contains performance results of FRE 11 R3 for Linux comparing to the FRE 11 R3 for Windows release.

English

Japanese

Korean

Chinese

Fixed Bugs

This section contains a list of bugs reported by customers that have been fixed.

Four-point scale will help you to evaluate the severity of each issue, enabling you to make informed decision on how important updates are for your system.

Critical A bug that causes crashes or hangings of software. Critical bugs can include access violations, internal program errors, stack overflow, out of memory or other exceptions that can lead to program failure.
Major A bug that does not cause program failure but affects major functionality of a feature or impairs the system’s performance. Major bugs can include disparity of the feature functionality to the internal specifications, memory leaks or data corruption.
Minor A bug that leads to feature malfunctioning or affects minor functionality of the software.  Minor bugs can include recognition errors, missing or lost objects, wrong color detection, incorrect document analysis, license counter errors, etc.
Trivial A cosmetic issue that does not affect the functionality of the product but can cause inconveniences. Trivial bugs can include Help file errors, log errors, incomplete information in error messages, etc.

The following table contains bugs fixed in this release sorted in descending order of severity. If the bugs have workarounds, root causes or side effects, they will be mentioned in the Description section.

 

Severity Description Subsystem HD # Office

Critical

 

The distributive doesn’t work on the machines with 48 physical cores. Installation 426861 EU
Major Only one recognition variant is saved with IRecognizerParams::SaveCharacterRecognitionVariants  set to TRUE. Recognizer 413018 US
Major The array indices numeration starts with 0 in Java (should be 1). API 443827 US
Major IPE: Src/Fonts/FontDescriptorImpl.cpp, 107. API 423822 US
Trivial The information about installation and de-installation of FREngine using RPM-packages is added to Help file Help 414469 US
Trivial The information about VisualBasic .Net was removed from the Help article. Help 445492 US

Known Issues and Workarounds

Linux version limitations

The following functionality of FineReader Engine 11 for Windows is not available in the Linux version:

•              DjVu opening

•              Scanning

•              ICR/OMR

•              Visual Components and other GUI elements

•              WDP/WIC/BITMAP input formats and other Windows-specific functionality

Some API is not implemented

The following API is not implemented in FRE 10 and FRE 11:

•              IFootnoteSeries::IsNumberingWithSuperscript. Always returns “false”.

•              IFootnoteSeries::PositionOnPage. Always returns “FPPT_SingleColumnSection”.

•              IFootnoteSeries::PositionInDocument. Always returns “FPDT_PageEnd”.

•              IFootnoteSeries::HasSeparator. Always returns “true”.

•              ITextPicture::ColumnNumber. Always returns “0”.

•              ICharParams::IsWordStart. Always returns “false”. It is true only for character parameters got through IWordRecognitionVariants interface.

•              IIncut::TextWrapping. Always returns “TW_Undefined”.

•              IRunningTitlesSeriesText::HasSeparator. Always returns “false”.

The implementation is not planned.

An error during the unloading of FREngine library on SLES 11 SP2 and SP3

An error message is shown during the unloading of FREngine library on SLES 11 SP2 and SP3.

Memory leak during processing of PDF files on Java

Out of memory asserts and internal program errors periodically occur during processing of large amount of PDF files.

We recommend to use 4 GM RAM or more for parallel processing of large PDF documents.

Floating point exception (core dumped) with installed “misaki” fonts.

(HelpDesk request 409065)

PDF/A validation report

The following issues are known for PDF/A files produced by this release of FRE 11:

1.             Adobe Acrobat 11.0.5 (Preflight 11.0.4) detects an error for PDF/A with attachments (PCM_Pdfa_3a):

“Embedded file does not have AF entity”

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request