ABBYY FineReader Engine 12 for Mac - Release 4 (includes Update 1)
- Release date: 07.01.2020 (public release)
-
Part #: 1377/3
-
Build #: 12.4.7.981
New input formats: Office documents
In addition to the most common image formats and PDFs (both scanned image-PDFs and PDFs created on computer via the printing function), the ABBYY FineReader Engine can now process the most common Office documents such as text documents, spreadsheets and presentations:
-
Text documents: doc, docx, rtf, htm / html, txt, odt
-
Spreadsheets: xls, xlsx, ods
-
Presentations: ppt, pptx, odp
The new input formats can significantly extend functionality of systems processing mixed types of documents or attachments of emails - for example to route emails to relevant departments or to automatically import email attachments in Document Management Systems.
Added: Intelligent Character Recognition Technology (ICR)
-
The high-quality technology for recognition of hand-printed text (ICR) was added to the FineReader Engine for Mac. The technology allows to extract information entered per hand in individual fields, as used for example on application forms or customer onboarding documents.
Added: Optical Mark Recognition Technology (OMR)
-
The advanced technology for recognition of optical marks was added to the FineReader Engine for Mac. The technology allows to extract information about selected fields on surveys, questionnaires or multiple choice exam sheets.
In the past, the ICR and OMR technologies were only available in the Windows version and were later added to the Linux version. With this release, the technologies are as well available in the Mac version.
New: Ability to extract information from MRZ in ID documents
-
Machine-readable zones (MRZ) are used in ID documents to encode personal information. In the machine-readable zone in ID documents, the personal information is displayed as 2 or 3 lines of text which are specified in the ICAO Document 9303.
-
With the ability to extract information from the machine-readable zones in ID documents, FineReader Engine can be used in a range of solutions such as:
-
ID verification systems that allow to quickly extract personal information from ID documents and compare it with information in the centralized database systems
-
Client onboarding solutions in banks, insurance companies, hotels, and car rental providers that allow to quickly extract personal information from ID documents and insert it into the company's databases
-
HR onboarding systems in companies that allow to quickly insert personal data from new employees
-
- The MRZ extraction function was enhanced by new document format enums to accurately attribute extracted data to Optional data and Personal number fields. In addition, the IMrzData has received a new property to inform the system, if a checksum digit for the whole document data is available.
New: 'Compare Documents' Module
-
To allow a user to quickly verify document's integrity, the new 'Compare Documents' Module in ABBYY FineReader Engine enables detecting content differences between two versions of the same document. The module works with documents in different formats such as Microsoft Word or PDF as well as with document images such as JPEG or TIF and many other formats, and can compare documents in all OCR languages supported by FineReader Engine 12. The results of the document content comparison are available through the API and can be as well delivered as Microsoft Word document with tracked changes that makes the content inconsistencies clearly visible.
-
This feature will be of a great value for any business customers as it allows very quick detection of possible manipulations in document content - for example when comparing the originally created contract and its printed and signed version.
-
For easy implementation and demonstration, a new ready-to-use code sample with sample documents is available and can be used in own applications to speed up development work. The sample compares the selected files and, if necessary, saves the detected differences to a file of the specified format. The comparison result contains the information about differences in the textual content, what type of modification was detected (deleted, inserted or modified text) and locations of the modification in the original and its copy. This code sample supports Russian and English languages. However, the 'Compare Documents' functionality in FineReader Engine is available for documents in all languages supported by the FineReader Engine.
-
The option IComparisonParams::CompareTablesSeparatelyFromText allows to further improve the detection of changes in documents that contain text areas on pages as well text areas within tables. This new option specifies whether text changes detected within tables should be displayed separately from the text modifications detected in free-flow text. By default, this property is set to FALSE.
New OCR languages
-
Georgian
The Georgian language was added as new OCR language, supporting recognition of Gerogian documents created in Sylfaen font, which is typically used in the most Gergian documents. Formatting such as bold, underlined and italic text is supported. This language does not include dictionary support.
Limitations: The Georgian scripts, Asomtavruli (“capitals”) and Nuskhuri (“small letters”) will be detected as such but exported as Nuskhuri in the results. Asomtavruli letters will receive a special flag in recognition results to distinguish them from Nuskhuri and indicate that these are capital letters.
-
Simple mathematical formulas
ABBYY FineReader Engine now allows extracting characters of simple mathematical formulas. A new OCR language was created that contains support for following characters:
-
!“%'()*+,-.
-
0123456789
-
⇔
-
ABCDEFGHIJKLMNOPQRSTUVWXYZ
-
[]^
-
Abcdefghijklmnopqrstuvwxyz
-
{}~°±¼½¾×÷ħΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩαβγδεζηθικλμνξοπρςστυφχψω
-
‰⅓⅔⅕⅖⅗⅘⅙⅚⅛⅜⅝⅞
-
↑→↦⇒⇔∀∂∃∄∆∈∉∏∑∕∖∙√∞∟∠∣∧∨∩∪∫≈≠≡≤≥≪≫⊂⊃⊄⊅⊆⊇⊊⊋⊥⋮□
The new OCR language allows recognizing simple math formulas using this set of characters. E.g. x+y=⅓±2 will be recognized. (Limitations: Very complex fractions, integrals or roots cannot be recognized.)
Enhanced OCR for Asian languages - with the support of Artificial Intelligence
The Convolutional Neural Network for recognition of Asian languages was retrained on thousands of documents. As a result, following improvements were reached: Japanese and Chinese provides significantly better recognition of documents in Japanese and Chinese.
-
Significantly faster recognition of Korean
-
Faster recognition of Chinese
-
Increased speed & accuracy in recognition of Japanese (Modern)
Improved: Korean and Arabic OCR with new AI-based algorithms
-
Arabic OCR: Significantly improved recognition on low quality images with a new Recurrent Neural Network
-
To significantly increase the recognition accuracy of Arabic especially on low quality images (where the 'traditional OCR approach' might not deliver sufficiently accurate results), a new Recurrent Neural Network (RNN) for recognition of Arabic was trained on an extensive amount of documents.
-
Deployment of this new AI-based technology enables the End-to-End text recognition that provides very high accuracy results. During the End-to-End text recognition process, the pre-trained Recurrent Neural Network automatically recognizes text 'as a whole' based on his 'knowledge' that was acquired during the training - without dividing the text strings into individual characters first.
-
To deliver the optimal balance between the processing speed and the recognition accuracy, an intelligent built-in classifier analyses the document prior to the actual recognition step, and selects the appropriate recognition methodology for each particular text snippet (faster 'traditional OCR' - or slower but more accurate 'end-to-end OCR') .
-
-
Korean OCR: Significantly better recognition results with a new Deep Learning Language model
-
To further increase the recognition accuracy of Korean, a new Deep Learning Language model was trained on a large document amount.
-
Following the actual text recognition step, this model (trained and deployed for recognition of the Korean language) analyses the recognition hypothesis and selects the best 'word recognition variant' among the individual recognition hypotheses - in some cases the model even generates a new recognition hypothesis based on a context. In this case, the preceding and following words will be analyzed. Based on these words, the new recognition hypothesis will be created.
-
To optimize the balance between recognition accuracy and speed, a smart built-in classifier decides on the necessity to deploy this new Deep Learning Language model (the Deep Learning Language model is more accurate but slower than the 'traditional' evaluation of recognition hypothesis and is therefore not used as default). The feature works in 'normal' recognition mode only.
-
Enhanced: Text-based classifier with advanced security of training data
-
To train and optimize the text-based classifier, documents representing each document category must be imported. In order to protect data contained in these training documents, hashing algorithms impede the possibility to recover information from the sample documents.
-
As the training algorithms use only information from the checksums of the documents, the pre-trained text-based classifier can be used by other users (and its quality can be further optimized by re-training it on their documents) - without any risk of detecting information in the documents originally used for its training.
-
Note: The provided API allows adding information to each individual training object. During the training, error messages can deliver useful information about the particular source files. When using this option during the creating, training and testing the text-based classifier, it is necessary to rebuild the pre-trained text-based classifier without the information about individual training objects prior to delivering it to its final users in order to maintain high-level security and confidentiality.
Enhanced: Classification Demo Sample - now with Office format documents
-
ABBYY FineReader Engine is able to process PDFs, scanned or photographed document images as well as documents available in Office formats. To reflect this capability in the classification process, the provided Demo Sample for classification was enhanced and allows now to import Office documents in addition to PDFs and image formats.
In addition, new sample documents for the image-based classifier were included and can be used to test the classification capabilities.
Enhancements in PDF export
-
Extended set of tags on export to a tagged PDF allows creating PDFs that are compliant with following regulations:
-
The FineReader Engine is able to save information about creation and editing dates during the export to PDF, thus allowing to document following information during the PDF export step:
-
Date of creation
-
Date of modification
-
Both information (creation and modification dates)
-
This information might be critical for document archiving, document references or during dispute resolution processes.
Improved: Document layout preservation
-
To improve the detection and recreation of document layout, a new 'single-column' document model was introduced that provides more exact detection and analysis of tables and charts.
-
The new 'single-column' analysis is a key subtask of the complete document analysis process that uses specific algorithms for analyzing and processing document columns - objects that are linearly arranged from top to bottom and vertically separated. Such objects can contain:
-
continuous text block
-
table of contents
-
picture
-
chart
-
table
-
screenshot
-
agglomerate of independent cells
-
The new 'single-column' analysis will significantly improve the detection and recreation of document layout and works in default (normal) document analysis mode.
Document processing in memory for the Batch Processor
This new feature allows to process documents in memory using the BatchProcessor Object. The new approach can decrease the requirements for free hard disk drive and increase the overall processing speed. Previously, document processing in memory was only available for the FRDocument Object.
Ability to recognize documents with different text types
If documents contain text areas in different text types on one page, the correct text type will be detected and used.
In the past, this functionality was only available in the Windows version and was added to the Linux version later on. With this step, the detection of individual text types on one page is as well available in the Mac version.
Enhanced: Java wrapper documentation
-
New JavaDoc for Java wrapper: To simplify the usage of the API, the documentation of the ABBYY FineReader Engine 12 has been extended and the documentation for the Java wrapper is now provided in JavaDoc format in addition to the HTML and PDF formats.
Enhanced documentation
-
The product documentation contains a new article that describes how to deploy FineReader Engine in Docker containers.
-
The EULA information is stored in FineReader Engine files for later reference
Other improvements
-
Compatibility with latest Java releases:
-
Allows to leverage modern development environments
-
-
Ability to detect the build number of FineReader Engine:
-
In scenarios where FineReader Engine is used as an optional model, the ability to detect the build number of FineReader Engine before loading a java wrapper allows to automatically check compatibility with the host system prior to loading the FineReader Engine's libraries.
-
IMPORTANT INFORMATION
-
GetEngine function was deprecated in R2 ⇒ To load the Engine object, please use the InitializeEngine function. It provides the unified Engine loading procedure for all license types (including the Online License).
-
Customers updating from previous versions of FineReader Engine such as version 9, 10, or 11 as well as upgrading from FineReader Engine 12 Release 1 to Release 2 and higher who use the GetEngine function would receive an error message if they keep using it in later releases. Please update your code and replace the GetEngine with the InitializeEngine function.
-
Open Office format support: To enable the import of Office formats, the Open Office, Libre Office or Microsoft Office must be manually installed. There is no ABBYY Open Office suite available via the automatic installation procedure.
Comments
0 comments
Please sign in to leave a comment.