ABBYY FineReader Engine 11 for Mac - Release 8

ABBYY FineReader Engine 11 for Mac - Release 8

  • Release date: Date: 28 April 2017 (public release)
  • Part #: 1161/27
  • Build #: 11.1.19.872047

Ability to remove garbage from color images

The extended image pre-processing increases recognition accuracy of color images. Similarly to the already existing feature for removing small excess dots, which slow down processing, from the black-and-white images, it is now possible to remove garbage from the colour images.

Ability to inject a text layer into selected pages of a PDF document

This feature improves flexibility. In this release, the possibility to inject text layer under the image is extended: Now it is possible to individually specify the pages, in which the text layer should be injected.

Extended method of injecting text into PDF

During the process of injecting text layer into scanned PDFs, the extended method allows to deskew and correct orientation of scanned PDFs. When processing a batch of PDFs containing both scanned and digital-born PDF documents, all scanned PDF images can be automatically extended by a text layer and turned into searchable PDF files – even documents that were scanned incorrectly.

Extension of method for detecting text layer in PDFs

The method for detecting text layer in PDFs has been extended. In the past the method accepted only a string for the first parameter 'FileName'. Now it is possible to pass a byte array for the 'FileName' as well. The extension of the method is useful in the scenario when working with PDFs from InputStream. PDFs imported from memory stream can now be checked for text layer without a need to write the stream into a temporary file which increases the overall processing speed.

Ability to rasterize FreeText annotations

When processing PDF documents that contain Text Box annotations and exporting them to PDF, it is now possible to retain all information from annotations in FreeText type in PDF.

Export for multi-page PDFs documents with an undefined number of pages

This feature increases efficiency when scanning large multipage documents. The new export approach introduced in the previous release has been modified in this release: Even if the number of pages of the document sent for processing is not known, the recognition session can still be created. When scanning multipage documents, the number of pages in a document is typically known only after the scanning step is completed. The modified new export API allows sending pages for recognition even if scanning of remaining pages of a multipage document is not yet finished.

Ability to adjust a time zone for PDF export

In previous releases it was possible to write the modification and the creation date using UTC format into the PDF file. Now it is possible to specify a time zone that will be used for the creation and modification date of the exporting documents. Several PDF viewer applications display creation/modification date of the document without using information about the user’s time zone. In some cases this missing information might be very important. These new options will allow to specify the creation and modification date for each PDF file.

[Technical preview] Faster PDF printing when using MRC compression

A new option in the set of MRC correction parameters allows to tune Mixed Raster Content parameters for PDF export. This increases the PDF printing speed. (At the moment, the feature is implemented as a technical preview.)

Improved readability of exported XML data for users

The default value of paragraph style names are now automatically generated according to the paragraphs’ role and modifications, which were applied to the style. This improves the readability of XML-based text and simplifies work for operators or system administrators. To increase flexibility, users can also set a paragraph style name manually.

Ability to exclude BOM during export to TXT

New export option allows specifying, whether the byte order mark (BOM) should appear at the start of the text stream, when the document is exported to TXT format in UTF-8 encoding. This saves Java developers from programming workarounds for discarding the BOM characters at the beginning of the file.

Updated documentation for working with screenshots

New recommendations for processing of screenshots were added into the documentation to support developers with useful tips for this increasingly popular scenario.
 

Part# 1161/27
Build#  11.1.19.872047

Upgrade from the Previous Versions and Releases

Binary Incompatibility

It is necessary to recompile host application regardless a version of Engine previously used.

API Changes

There are no API Changes.

Changes in Release 8 Update

IENGINE::INJECTTEXTLAYER METHOD IS MARKED AS DEPRECATED

IEngine::InjectTextLayer is deprecated and will be removed in the next major release. IEngine::InjectTextLayerEx method must be used instead.

Please find more information on ‘Large document conversion to searchable PDFs improvement’.

SOME METHODS, OBJECTS AND INTERFACES RELATED TO ONE-PAGE DOCUMENT PROCESSING ARE MARKED AS DEPRECATED

The following methods are marked as deprecated and will be removed in the next major release:

Methods:

  • IEngine::ProcessPage
  • IEngine::ProcessPagesEx
  • IEngine::ExportPage
  • IEngine::ExportPagesEx
  • IEngine::SynthesizePagesEx
  • IEngine::OpenImageFile
  • IEngine::PrepareImageFile
  • IEngine::CreateLayout
  • Objects:
  • DocumentAnalyzer
  • DocumentInfo
  • Exporter

Interfaces:

  • IDocumentAnalyzerEvents
  • IRecognizedPages
  • IExporterEvents

One-page document processing may be done by using properties and methods of Engine, FRPage and FRDocument objects as following:

One-page API

(deprecated)

API to be used Description
IDocumentAnalyzer interface is a basic element of one-page API. It is used to do the majority of operations with documents.
PreprocessAnalyzeRecognizePage method IFRPage::PreprocessAnalyzeRecognize  
PreprocessPage method IFRPage::Preprocess  
CorrectGeometricalDistortions method IFRPage::CorrectGeometricalDistortions  
DetectOrientation method IFRPage::DetectOrientation  
FindPageSplitPosition method IFRPage::FindPageSplitPosition  
AnalyzePage method IFRPage::Analyze  
ExtractBarcodes method IFRPage::ExtractBarcodes  
AnalyzeRegion method IFRPage::AnalyzeRegion  
AnalyzeTable method IFRPage::AnalyzeTable  
RecognizePage method IFRPage::Recognize  
RecognizeBlocks method IFRPage::RecognizeBlocks  
RecognizeImageAsPlainText method Recognize using IFRDocument::Process, then call IFRDocument ::PlainText attribute  
RecognizeImageDocumentAsPlainText method Recognize using IFRPage::PreprocessAnalyzeRecognize, then call IFRDocument::PlainText attribute The main difference from RecognizeImageAsPlainText is that the first method creates a document IImageDocument inside the method, but in the second case the method gets the document as an input parameter.
PreprocessAnalyzeRecognizePagesEx method IFRDocument::Process IRecognizedPages is deprecated and  IFRDocument already have a functionality to manage with multi-page documents
PreprocessPagesEx method IFRDocument::Preprocess
AnalyzePagesEx method IFRDocument::Analyze
RecognizePagesEx method IFRDocument::Recognize
LearnCheckmarks method IFRPage::LearnCheckmarks  
CleanRecognizerSession method The method moves to IEngine::CleanRecognizerSession  
AddWordToCacheDictionary method

 

At this moment there is no analogue for this methods. We are considering to implement them inside the IEngine interface.

 
AddWordsToCacheDictionary method
CleanCacheDictionary method
AutoCleanRecognizerSession attribute The attribute moves to IEngine::AutoCleanRecognizerSession  
IDocumentAnalyzerEvents interface is a callback-interface for analysis of events appearing during the documents processing.
OnRegionProcessed method IFRPageEvents::OnRegionProcessed  
OnProgress method

IFRPageEvents::OnProgress

IFRDocumentEvents::OnProgress

OnWarning method

IFRPageEvents::OnWarning

IFRDocumentEvents::OnWarning

IDocumentInfo interface is an object that contains an information about the document that was get during the document processing.
Метод AddImageDocument IFRDocument::AddImageDocument Is a wrapper on IFRDocument. The specified  IImageDocument  document is referred to it and so it is possible to get an information about this document.
Метод DocumentContentInfo IFRDocument::DocumentContentInfo
IEngine interface is a mainpoint of entry for work with FREngine
ProcessPage method IFRDocument::Process Methods with *PagesEx suffix use outdated IRecognizedPages
ProcessPagesEx method
ExportPage method IExporter::ExportPage
ExportPagesEx method IExporter::ExportPageEx
SynthesizePagesEx method IFRDocument::Synthesize
OpenImageFile method IFRDocument::AddImageFile  
PrepareImageFile method  
CreateLayout method No implementation It is assumed that document’s layout  ILayout can’t exist without pages.
IExporter is to save recognized text to different formats.
ExportPage method IFRDocument::Export  
ExportPagesEx method IRecognizedPages is used as a mechanism for multi-page documents processing
IRecognizedPages is a callback-interface for multi-page documents processing in on-page API
PageIds method This interface is outdated because a mechanism for multi-page documents processing has already built-in in  IFRDocument.
Layout method
ImageDocument method
ReleasePage method
IExporterEvents is a callback-interface that give an opportunity tomonitor the process of document export and its progress.
ReportPercentage method Is implemetnted througt the IFRDocumentEvents::OnProgress method

TEXTORIENTATION::ISVERTICALLYMIRRORED PROPERTY IS MARKED AS DEPRECATED

TextOrientation::IsVerticallyMirrored property is marked as deprecated and will be removed in future versions. The cause is that no scenarios when the detection of vertically mirrored orientation is necessary have been found.

ADDED IPDFMRCPARAMS:: USEMULTIPLEMASKS

New property for faster PDF files with MRC was added to FRE. Please, find more information on ‘New option for faster printing of PDF using MRC’.

ADDED IPREPAREIMAGEMODE:: RASTERIZEFREETEXT

A new method for rasterizing FreeText annotations was added. Find detailed information on ‘An opportunity to rasterize FreeText annotations’.

ADDED INTERFACE AND METHOD FOR TUNING NEW INJECTTEXTLAYEREX2 METHOD

ITextLayerInjectionParams interface and CreateTextLayerInjectionParams method was added. Please, find the detailed information on ‘Orientation and skew correction on text injection’.

INSERTTAB METHOD OF THE PARAGRAPH OBJECT WAS ADDED

New method inserts the tabulation symbol into chosen text position.

ITABPOSITIONS::ADDNEW IS MARKED AS DEPRECATED AND ITABPOSITIONS::ADDNEWEX METHOD WAS ADDED

The ITabPositions::AddNew method was not implemented and now it returns E_NOTIMPL code.

To replace this method, ITabPositions::AddNewEx was added.

Changes in behavior

AN ABILITY TO SAVE LICENSE COUNTERS VALUE ON LICENSE DEACTIVATION

A new section 'ZeroLevel' was added into License Wizard for FRE 11 Mac. The only one available option in this section is ZeroLevel.

If this option is activated, a license counter value will be saved in ABBYY Registration Server during the license deactivation. On further activation counter will take value from ABBYY Registration Server.

New Features and Improvements

Release 8 Update

CONVERSION DOCUMENTS TO SEARCHABLE PDFS AT FULL THROTTLE REQUIRES NO PAGE NUMBERS SETUP ANY MORE

It is possible to convert documents containing undefined number of pages to searchable PDF using ExportFileWriter. The PagesCount parameter must be set to -1 in this case (this parameter is deprecated and will be removed in future versions).

 

From now, our customers have no necessity to know an amount of pages in the document at the moment of creating a session of recognition.

It can be useful for effective work with scanners when one has to process a lot of pages and doesn’t know the amount of them in the document until the end of scanning, but needs the processing to start before the scanning will be finished.

Garbage removal from color images

New method ImageDocument::RemoveGarbageEx works with both color and white-and-black images. From now it is possible to remove garbage from color images using this method.

Input image Output image

Dealing with black-and-white images the method works the same way as ImageDocument::RemoveGarbage which will be removed in future versions of FRE.

POSSIBILITY TO INJECT TEXT LAYER TO SELECTED PAGES OF PDF DOCUMENTS 

A new method IEngine::InjectTextLayerEx allows to process specific pages in the "image only" or "image on text" PDF documents. It creates a searchable PDF file which contains the same page images and the invisible text layer created from the recognized text of the document.

This new method works the same way as IEngine::InjectTextLayer and has new arguments:

  • PageIndices. This parameter refers to the IntsCollection object which specifies the indices of the document pages, to which the text will be injected. This parameter is optional and may be 0, in which case the text will be injected to all the pages of the document.
  • ProcessingEvents. Refers to the interface of the user-implemented object that is used for reporting events to the listeners. This parameter may be 0, in which case no callback will be attached.

The IEngine::InjectTextLayer method is now deprecated and will be removed in future versions.

ABILITY TO ADJUST A TIME ZONE FOR PDF EXPORT

New export parameter ReferenceTimeZone of IDPFExportFeatures interface allows our customers to adjust a time zone that will be used for the creation and modification date of the exporting documents.

Earlier FREngine have always wrote modification and creation date using UTC format. It was a problem for one of our customers. Several Adobe products (e.g. Adobe Reader) display creation/modification date of the document without using an information about user’s time zone.

To fix this situation the new options were added. ReferenceTimeZone has 3 values: TZT_UTC, TZT_Local and TZT_Daylight values for UTC time, local time and local time considering wintertime respectively. The default value of the parameter is TZT_UTC.

A NEW ARTICLE ABOUT WORKING WITH SCREENSHOTS IN DOCUMENTATION

An article ‘I am working with the screenshot image. Are there any special recommendations for screenshot processing?’ about suitable settings of FRE for screenshot processing was added. It is in the FAQ section.

ORIENTATION AND SKEW CORRECTION ON TEXT INJECTION

All settings in the PrepareImageMode and PageProcessingParams opject such as CorrectSkew and CorrecrOrientation work when passed to InjectTextLayer method.

Sometimes customers have PDFs that contain a mixture of digitally published files and scans. Pages that are digitally published do not need any correction, but scans may need it.

From FRE 11 R8, the new InjectTextLayerEx2 method is available. This method works the same as InjectTextLayerEx, but it additionally can correct orientation and skew of input images.

New TextLayerInjectionParams object allows to tune the parameters of processing the input "image only" or "image on text" PDF files and creation of a searchable PDF file using InjectTextLayerEx2 method.

AN OPPORTUNITY TO CHECK PDF FILES THAT PLACED IN MEMORY FOR TEXT LAYER

Sometimes it is necessary to work with InputStream for working with the files.

Previous versions of FRE 11 has only method IsPdfWithTextualContent for checking PDFs for text layer that required a string for FileName. As the customer works with InputStream it is necessary to pass a byte array as the first parameter for the method. In order to be able to check the files earlier they wrote the stream first in a temp file, but it is not acceptable for them since it would have performance impact.

A new method IsPdfWithTextualContentFromStream accepts reference to a read stream which contains the PDF file in which to detect the text layer.

AN OPPORTUNITY NOT TO WRITE BOM ON EXPORT TO TXT

FRE 11 always wrote Byte Order Mark (BOM) in case of export to TXT.

UTF-8 permits the BOM, but does not require or recommend its use. Byte order has no meaning in UTF-8 and is used only to signal at the start that the text stream is encoded in UTF-8.

Java's UTF-8 encoding has a known behavior and does not recognize this character as a BOM; the result of reading such a stream is a set of characters beginning with FEFF.

So, there are scenarios when clients may need not to write BOM in case of export to TXT. New export option WriteBomCharacter that solves this problem is available since FRE 11 R8.

HUMAN-READABLE NAMES OF PARAGRAPH STYLE ON EXPORT TO XML

Default values of names of paragraph styles were modified and now they can be easier read by a human. Now the names are formed basing on a role of the paragraph and modifications, applied to the style. Earlier GUID was used for naming the styles and it was not informative.

Users also have an opportunity to set paragraph style name manually using IParagraphStyle::Name method.

ADDED ‘NOT ENOUGH DISK SPACE’ ERROR CODE

A new error message ‘Not enough disk space’ for cases when there is not enough space on the disk was added.

NEW OPTION FOR FASTER PRINTING OF PDF USING MRC

Some our customers complained on too slow printing of PDFs with Mixed Raster Content.

FRE 11 R8 has new option UseMultipleMasks of IPDFMRCParams for tuning MRC parameters. This option activates a special mode in which different monochrome masks are used instead of one multicolor. This leads to faster document printing.

This mode allows to create only documents with monochrome characters. E.g. it is not possible to create documents with characters with a gradient filling using the new option.

The feature is implemented as technical preview and has not been tested for wide use. So, usage of new mode can cause some issues.

Performance results

This section contains performance results of the release comparing to the previous releases. The processor of the testing machine is Intel® Core™ i5-3450 CPU (3.10GHz, 4 physical cores) with 8GB of RAM.

English

Japanese

Korean

Chinese PRC

Chinese Taiwan

Arabic

Fixed Bugs

 

This section contains a list of bugs reported by customers that have been fixed.

Four-point scale will help you to evaluate the severity of each issue, enabling you to make informed decision on how important updates are for your system.

Critical A bug that causes crashes or hangings of software. Critical bugs can include access violations, internal program errors, stack overflow, out of memory or other exceptions that can lead to program failure.
Major A bug that does not cause program failure but affects major functionality of a feature or impairs the system’s performance. Major bugs can include disparity of the feature functionality to the internal specifications, memory leaks or data corruption.
Minor A bug that leads to feature malfunctioning or affects minor functionality of the software.  Minor bugs can include recognition errors, missing or lost objects, wrong color detection, incorrect document analysis, license counter errors, etc.
Trivial A cosmetic issue that does not affect the functionality of the product but can cause inconveniences. Trivial bugs can include Help file errors, log errors, incomplete information in error messages, etc.

 

The following table contains bugs fixed in this release sorted in descending order of severity. If the bugs have workarounds, root causes or side effects, they will be mentioned in the Description section.

 

Severity Description Technology Subsystem HD # Office
Critical Application failure in case of export PDF processing results to PDF, when Engine is loaded using GetEngineObjectEx and FREngineTempFolder is set by default. API 586844 3A
Critical An error occurs in case of adding PDF document with password protection. API 549575 3A

 

Known Issues and Workarounds

Mac OS version limitations

The following functionality of FineReader Engine 11 for Windows is not available in the Mac OS version:

  • DjVu opening
  • Scanning
  • ICR/OMR
  • Visual Components and other GUI elements
  • WDP/WIC/BITMAP input formats and other Windows-specific functionality
  • PDF text layer reusing
  • Attachments extraction from PDF
  • Bookmarks extraction from PDF
  • Metadata extraction from PDF (the information about author, keywords, subject and title)

Java wrapper is not included into the distribution

Though we have partly functional Java wrapper for the Engine it still misses some important parts.

In some cases current version of the wrapper is enough, so please consult with HQ product analyst in case of urgent need.

Some API is not implemented

The following API is not implemented in FRE 10 and FRE 11:

  • IFootnoteSeries::HasSeparator. Always returns “true”.
  • ITextPicture::ColumnNumber. Always returns “0”.
  • ICharParams::IsWordStart. Always returns “false”. It is true only for character parameters got through IWordRecognitionVariants interface.
  • IIncut::TextWrapping. Always returns “TW_Undefined”.
  • IRunningTitlesSeriesText::HasSeparator. Always returns “false”.

 

The implementation is not planned.

Farsi language is based on Arabic language

Farsi OCR output indicates Language ID as Arabic (CharParams::LanguageId = LI_ArabicSaudiArabia). This is because Farsi technology preview is based on Arabic OCR language.

The fix will be available in FRE 12.

IPDFMRCParams::MonochromeText doesn’t work correctly

Different algorithms of compressions during export to PDF are used with IPDFMRCParams::MonochromeText set to FALSE or DEFAULT whereas default value is FALSE.

Language auto detection uses CJK resources though none is selected for OCR

‘Cjk.*’ resource files are used while RecognizerParams::LanguageDetectionMode = TSPV_Yes even though recognition languages set does not include any of CJK languages.

The fix will be available in the FRE 11 R7. The proxy feature description is missing in the documentation

30% slowdown in all scenarios

Internal tests shows that this release got 30% slowdown comparing to R2.

We are investigating the reason and working on a patch.

AddImageFileFromMemory does not open PDF files

An attempt to open a PDF file from a memory ends up with the error ‘The image file you specify is empty.’

R7 will include a fix.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request