ABBYY FineReader Engine 11 for Mac - Release 8
- Release date: Date: 28 April 2017 (public release)
- Part #: 1161/27
- Build #: 11.1.19.872047
Ability to remove garbage from color images
Ability to inject a text layer into selected pages of a PDF document
Extended method of injecting text into PDF
Extension of method for detecting text layer in PDFs
Ability to rasterize FreeText annotations
Export for multi-page PDFs documents with an undefined number of pages
Ability to adjust a time zone for PDF export
[Technical preview] Faster PDF printing when using MRC compression
Improved readability of exported XML data for users
Ability to exclude BOM during export to TXT
Updated documentation for working with screenshots
| Part# | 1161/27 |
| Build# | 11.1.19.872047 |
Upgrade from the Previous Versions and Releases
Binary Incompatibility
It is necessary to recompile host application regardless a version of Engine previously used.
API Changes
There are no API Changes.
Changes in Release 8 Update
IENGINE::INJECTTEXTLAYER METHOD IS MARKED AS DEPRECATED
IEngine::InjectTextLayer is deprecated and will be removed in the next major release. IEngine::InjectTextLayerEx method must be used instead.
Please find more information on ‘Large document conversion to searchable PDFs improvement’.
SOME METHODS, OBJECTS AND INTERFACES RELATED TO ONE-PAGE DOCUMENT PROCESSING ARE MARKED AS DEPRECATED
The following methods are marked as deprecated and will be removed in the next major release:
Methods:
- IEngine::ProcessPage
- IEngine::ProcessPagesEx
- IEngine::ExportPage
- IEngine::ExportPagesEx
- IEngine::SynthesizePagesEx
- IEngine::OpenImageFile
- IEngine::PrepareImageFile
- IEngine::CreateLayout
- Objects:
- DocumentAnalyzer
- DocumentInfo
- Exporter
Interfaces:
- IDocumentAnalyzerEvents
- IRecognizedPages
- IExporterEvents
One-page document processing may be done by using properties and methods of Engine, FRPage and FRDocument objects as following:
|
One-page API (deprecated) |
API to be used | Description |
| IDocumentAnalyzer interface is a basic element of one-page API. It is used to do the majority of operations with documents. | ||
| PreprocessAnalyzeRecognizePage method | IFRPage::PreprocessAnalyzeRecognize | |
| PreprocessPage method | IFRPage::Preprocess | |
| CorrectGeometricalDistortions method | IFRPage::CorrectGeometricalDistortions | |
| DetectOrientation method | IFRPage::DetectOrientation | |
| FindPageSplitPosition method | IFRPage::FindPageSplitPosition | |
| AnalyzePage method | IFRPage::Analyze | |
| ExtractBarcodes method | IFRPage::ExtractBarcodes | |
| AnalyzeRegion method | IFRPage::AnalyzeRegion | |
| AnalyzeTable method | IFRPage::AnalyzeTable | |
| RecognizePage method | IFRPage::Recognize | |
| RecognizeBlocks method | IFRPage::RecognizeBlocks | |
| RecognizeImageAsPlainText method | Recognize using IFRDocument::Process, then call IFRDocument ::PlainText attribute | |
| RecognizeImageDocumentAsPlainText method | Recognize using IFRPage::PreprocessAnalyzeRecognize, then call IFRDocument::PlainText attribute | The main difference from RecognizeImageAsPlainText is that the first method creates a document IImageDocument inside the method, but in the second case the method gets the document as an input parameter. |
| PreprocessAnalyzeRecognizePagesEx method | IFRDocument::Process | IRecognizedPages is deprecated and IFRDocument already have a functionality to manage with multi-page documents |
| PreprocessPagesEx method | IFRDocument::Preprocess | |
| AnalyzePagesEx method | IFRDocument::Analyze | |
| RecognizePagesEx method | IFRDocument::Recognize | |
| LearnCheckmarks method | IFRPage::LearnCheckmarks | |
| CleanRecognizerSession method | The method moves to IEngine::CleanRecognizerSession | |
| AddWordToCacheDictionary method |
At this moment there is no analogue for this methods. We are considering to implement them inside the IEngine interface. |
|
| AddWordsToCacheDictionary method | ||
| CleanCacheDictionary method | ||
| AutoCleanRecognizerSession attribute | The attribute moves to IEngine::AutoCleanRecognizerSession | |
| IDocumentAnalyzerEvents interface is a callback-interface for analysis of events appearing during the documents processing. | ||
| OnRegionProcessed method | IFRPageEvents::OnRegionProcessed | |
| OnProgress method |
IFRPageEvents::OnProgress IFRDocumentEvents::OnProgress |
|
| OnWarning method |
IFRPageEvents::OnWarning IFRDocumentEvents::OnWarning |
|
| IDocumentInfo interface is an object that contains an information about the document that was get during the document processing. | ||
| Метод AddImageDocument | IFRDocument::AddImageDocument | Is a wrapper on IFRDocument. The specified IImageDocument document is referred to it and so it is possible to get an information about this document. |
| Метод DocumentContentInfo | IFRDocument::DocumentContentInfo | |
| IEngine interface is a mainpoint of entry for work with FREngine | ||
| ProcessPage method | IFRDocument::Process | Methods with *PagesEx suffix use outdated IRecognizedPages |
| ProcessPagesEx method | ||
| ExportPage method | IExporter::ExportPage | |
| ExportPagesEx method | IExporter::ExportPageEx | |
| SynthesizePagesEx method | IFRDocument::Synthesize | |
| OpenImageFile method | IFRDocument::AddImageFile | |
| PrepareImageFile method | ||
| CreateLayout method | No implementation | It is assumed that document’s layout ILayout can’t exist without pages. |
| IExporter is to save recognized text to different formats. | ||
| ExportPage method | IFRDocument::Export | |
| ExportPagesEx method | IRecognizedPages is used as a mechanism for multi-page documents processing | |
| IRecognizedPages is a callback-interface for multi-page documents processing in on-page API | ||
| PageIds method | This interface is outdated because a mechanism for multi-page documents processing has already built-in in IFRDocument. | |
| Layout method | ||
| ImageDocument method | ||
| ReleasePage method | ||
| IExporterEvents is a callback-interface that give an opportunity tomonitor the process of document export and its progress. | ||
| ReportPercentage method | Is implemetnted througt the IFRDocumentEvents::OnProgress method | |
TEXTORIENTATION::ISVERTICALLYMIRRORED PROPERTY IS MARKED AS DEPRECATED
TextOrientation::IsVerticallyMirrored property is marked as deprecated and will be removed in future versions. The cause is that no scenarios when the detection of vertically mirrored orientation is necessary have been found.
ADDED IPDFMRCPARAMS:: USEMULTIPLEMASKS
New property for faster PDF files with MRC was added to FRE. Please, find more information on ‘New option for faster printing of PDF using MRC’.
ADDED IPREPAREIMAGEMODE:: RASTERIZEFREETEXT
A new method for rasterizing FreeText annotations was added. Find detailed information on ‘An opportunity to rasterize FreeText annotations’.
ADDED INTERFACE AND METHOD FOR TUNING NEW INJECTTEXTLAYEREX2 METHOD
ITextLayerInjectionParams interface and CreateTextLayerInjectionParams method was added. Please, find the detailed information on ‘Orientation and skew correction on text injection’.
INSERTTAB METHOD OF THE PARAGRAPH OBJECT WAS ADDED
New method inserts the tabulation symbol into chosen text position.
ITABPOSITIONS::ADDNEW IS MARKED AS DEPRECATED AND ITABPOSITIONS::ADDNEWEX METHOD WAS ADDED
The ITabPositions::AddNew method was not implemented and now it returns E_NOTIMPL code.
To replace this method, ITabPositions::AddNewEx was added.
Changes in behavior
AN ABILITY TO SAVE LICENSE COUNTERS VALUE ON LICENSE DEACTIVATION
A new section 'ZeroLevel' was added into License Wizard for FRE 11 Mac. The only one available option in this section is ZeroLevel.
If this option is activated, a license counter value will be saved in ABBYY Registration Server during the license deactivation. On further activation counter will take value from ABBYY Registration Server.
New Features and Improvements
Release 8 Update
CONVERSION DOCUMENTS TO SEARCHABLE PDFS AT FULL THROTTLE REQUIRES NO PAGE NUMBERS SETUP ANY MORE
It is possible to convert documents containing undefined number of pages to searchable PDF using ExportFileWriter. The PagesCount parameter must be set to -1 in this case (this parameter is deprecated and will be removed in future versions).
From now, our customers have no necessity to know an amount of pages in the document at the moment of creating a session of recognition.
It can be useful for effective work with scanners when one has to process a lot of pages and doesn’t know the amount of them in the document until the end of scanning, but needs the processing to start before the scanning will be finished.
Garbage removal from color images
New method ImageDocument::RemoveGarbageEx works with both color and white-and-black images. From now it is possible to remove garbage from color images using this method.
| Input image | Output image |
Dealing with black-and-white images the method works the same way as ImageDocument::RemoveGarbage which will be removed in future versions of FRE.
POSSIBILITY TO INJECT TEXT LAYER TO SELECTED PAGES OF PDF DOCUMENTS
A new method IEngine::InjectTextLayerEx allows to process specific pages in the "image only" or "image on text" PDF documents. It creates a searchable PDF file which contains the same page images and the invisible text layer created from the recognized text of the document.
This new method works the same way as IEngine::InjectTextLayer and has new arguments:
- PageIndices. This parameter refers to the IntsCollection object which specifies the indices of the document pages, to which the text will be injected. This parameter is optional and may be 0, in which case the text will be injected to all the pages of the document.
- ProcessingEvents. Refers to the interface of the user-implemented object that is used for reporting events to the listeners. This parameter may be 0, in which case no callback will be attached.
The IEngine::InjectTextLayer method is now deprecated and will be removed in future versions.
ABILITY TO ADJUST A TIME ZONE FOR PDF EXPORT
New export parameter ReferenceTimeZone of IDPFExportFeatures interface allows our customers to adjust a time zone that will be used for the creation and modification date of the exporting documents.
Earlier FREngine have always wrote modification and creation date using UTC format. It was a problem for one of our customers. Several Adobe products (e.g. Adobe Reader) display creation/modification date of the document without using an information about user’s time zone.
To fix this situation the new options were added. ReferenceTimeZone has 3 values: TZT_UTC, TZT_Local and TZT_Daylight values for UTC time, local time and local time considering wintertime respectively. The default value of the parameter is TZT_UTC.
A NEW ARTICLE ABOUT WORKING WITH SCREENSHOTS IN DOCUMENTATION
An article ‘I am working with the screenshot image. Are there any special recommendations for screenshot processing?’ about suitable settings of FRE for screenshot processing was added. It is in the FAQ section.
ORIENTATION AND SKEW CORRECTION ON TEXT INJECTION
All settings in the PrepareImageMode and PageProcessingParams opject such as CorrectSkew and CorrecrOrientation work when passed to InjectTextLayer method.
Sometimes customers have PDFs that contain a mixture of digitally published files and scans. Pages that are digitally published do not need any correction, but scans may need it.
From FRE 11 R8, the new InjectTextLayerEx2 method is available. This method works the same as InjectTextLayerEx, but it additionally can correct orientation and skew of input images.
New TextLayerInjectionParams object allows to tune the parameters of processing the input "image only" or "image on text" PDF files and creation of a searchable PDF file using InjectTextLayerEx2 method.
AN OPPORTUNITY TO CHECK PDF FILES THAT PLACED IN MEMORY FOR TEXT LAYER
Sometimes it is necessary to work with InputStream for working with the files.
Previous versions of FRE 11 has only method IsPdfWithTextualContent for checking PDFs for text layer that required a string for FileName. As the customer works with InputStream it is necessary to pass a byte array as the first parameter for the method. In order to be able to check the files earlier they wrote the stream first in a temp file, but it is not acceptable for them since it would have performance impact.
A new method IsPdfWithTextualContentFromStream accepts reference to a read stream which contains the PDF file in which to detect the text layer.
AN OPPORTUNITY NOT TO WRITE BOM ON EXPORT TO TXT
FRE 11 always wrote Byte Order Mark (BOM) in case of export to TXT.
UTF-8 permits the BOM, but does not require or recommend its use. Byte order has no meaning in UTF-8 and is used only to signal at the start that the text stream is encoded in UTF-8.
Java's UTF-8 encoding has a known behavior and does not recognize this character as a BOM; the result of reading such a stream is a set of characters beginning with FEFF.
So, there are scenarios when clients may need not to write BOM in case of export to TXT. New export option WriteBomCharacter that solves this problem is available since FRE 11 R8.
HUMAN-READABLE NAMES OF PARAGRAPH STYLE ON EXPORT TO XML
Default values of names of paragraph styles were modified and now they can be easier read by a human. Now the names are formed basing on a role of the paragraph and modifications, applied to the style. Earlier GUID was used for naming the styles and it was not informative.
Users also have an opportunity to set paragraph style name manually using IParagraphStyle::Name method.
ADDED ‘NOT ENOUGH DISK SPACE’ ERROR CODE
A new error message ‘Not enough disk space’ for cases when there is not enough space on the disk was added.
NEW OPTION FOR FASTER PRINTING OF PDF USING MRC
Some our customers complained on too slow printing of PDFs with Mixed Raster Content.
FRE 11 R8 has new option UseMultipleMasks of IPDFMRCParams for tuning MRC parameters. This option activates a special mode in which different monochrome masks are used instead of one multicolor. This leads to faster document printing.
This mode allows to create only documents with monochrome characters. E.g. it is not possible to create documents with characters with a gradient filling using the new option.
The feature is implemented as technical preview and has not been tested for wide use. So, usage of new mode can cause some issues.
Performance results
This section contains performance results of the release comparing to the previous releases. The processor of the testing machine is Intel® Core™ i5-3450 CPU (3.10GHz, 4 physical cores) with 8GB of RAM.
English
Japanese
Korean
Chinese PRC
Chinese Taiwan
Arabic
Fixed Bugs
This section contains a list of bugs reported by customers that have been fixed.
Four-point scale will help you to evaluate the severity of each issue, enabling you to make informed decision on how important updates are for your system.
| Critical | A bug that causes crashes or hangings of software. Critical bugs can include access violations, internal program errors, stack overflow, out of memory or other exceptions that can lead to program failure. |
| Major | A bug that does not cause program failure but affects major functionality of a feature or impairs the system’s performance. Major bugs can include disparity of the feature functionality to the internal specifications, memory leaks or data corruption. |
| Minor | A bug that leads to feature malfunctioning or affects minor functionality of the software. Minor bugs can include recognition errors, missing or lost objects, wrong color detection, incorrect document analysis, license counter errors, etc. |
| Trivial | A cosmetic issue that does not affect the functionality of the product but can cause inconveniences. Trivial bugs can include Help file errors, log errors, incomplete information in error messages, etc. |
The following table contains bugs fixed in this release sorted in descending order of severity. If the bugs have workarounds, root causes or side effects, they will be mentioned in the Description section.
| Severity | Description | Technology Subsystem | HD # | Office |
| Critical | Application failure in case of export PDF processing results to PDF, when Engine is loaded using GetEngineObjectEx and FREngineTempFolder is set by default. | API | 586844 | 3A |
| Critical | An error occurs in case of adding PDF document with password protection. | API | 549575 | 3A |
Known Issues and Workarounds
Mac OS version limitations
The following functionality of FineReader Engine 11 for Windows is not available in the Mac OS version:
- DjVu opening
- Scanning
- ICR/OMR
- Visual Components and other GUI elements
- WDP/WIC/BITMAP input formats and other Windows-specific functionality
- PDF text layer reusing
- Attachments extraction from PDF
- Bookmarks extraction from PDF
- Metadata extraction from PDF (the information about author, keywords, subject and title)
Java wrapper is not included into the distribution
Though we have partly functional Java wrapper for the Engine it still misses some important parts.
In some cases current version of the wrapper is enough, so please consult with HQ product analyst in case of urgent need.
Some API is not implemented
The following API is not implemented in FRE 10 and FRE 11:
- IFootnoteSeries::HasSeparator. Always returns “true”.
- ITextPicture::ColumnNumber. Always returns “0”.
- ICharParams::IsWordStart. Always returns “false”. It is true only for character parameters got through IWordRecognitionVariants interface.
- IIncut::TextWrapping. Always returns “TW_Undefined”.
- IRunningTitlesSeriesText::HasSeparator. Always returns “false”.
The implementation is not planned.
Farsi language is based on Arabic language
Farsi OCR output indicates Language ID as Arabic (CharParams::LanguageId = LI_ArabicSaudiArabia). This is because Farsi technology preview is based on Arabic OCR language.
The fix will be available in FRE 12.
IPDFMRCParams::MonochromeText doesn’t work correctly
Different algorithms of compressions during export to PDF are used with IPDFMRCParams::MonochromeText set to FALSE or DEFAULT whereas default value is FALSE.
Language auto detection uses CJK resources though none is selected for OCR
‘Cjk.*’ resource files are used while RecognizerParams::LanguageDetectionMode = TSPV_Yes even though recognition languages set does not include any of CJK languages.
The fix will be available in the FRE 11 R7. The proxy feature description is missing in the documentation
30% slowdown in all scenarios
Internal tests shows that this release got 30% slowdown comparing to R2.
We are investigating the reason and working on a patch.
AddImageFileFromMemory does not open PDF files
An attempt to open a PDF file from a memory ends up with the error ‘The image file you specify is empty.’
R7 will include a fix.