ABBYY Cloud OCR SDK OCR Core Update Q3’21

New features

OCR technology update

To implement the neural network approaches in OCR technologies, ABBYY FineReader Engine was enhanced by the new features of processing the handprinted and Latin symbols:

new language model using both for a consistent choice of word variants generated by OCR and for the substitution of new word variants.
ICR for Latin scripts for better classification of single hardwritten characters neatly on one line, in a free field, or on a line, respecting the spaces between characters.
End-to-end recognition for Latin-based languages.

Machine learning barcode recognition technology

Neural network architecture introduces a new barcode recognition model based on parsing image pixels into two categories: barcode or non-barcode. Regions built around the connected components are further considered as a barcode hypothesis. The recognition process is started using data from each region containing the type of barcode that is highlighted as the most probable.
New technology depends on image resolution and has no dependency on the number of barcode types to detect. This leads to slower processing speed comparing to the legacy technology on images of less than 300 dpi resolution and when a single or few barcode types are to be detected.

Improved table structure analysis

With the improved mechanism of document conversion, ABBYY FineReader Engine can detect false vertical separators and correctly process the tables with columns of numbers in the ‘Accounting’ format when a currency symbol ('$') is aligned to the left in all cells.

OCR improvements for text near stamps and signatures

To improve the results of recognition for the agreements, a new neural network model for detecting stamps, logos, and signatures is now applied. This model allows to detect the additional elements in the footers and requisites area of the document, exclude them from the analysis, and highlight the text in the image, ignoring the details. The recognized blocks are superimposed on the image in such a way as if it were a similar document without extraneous marks and stamps.

NeoML (open-source ML C++ library) usage

NeoML is an end-to-end machine learning framework that allows you to build, train, and deploy ML models. This framework is used by ABBYY engineers for computer vision and natural language processing tasks, including image preprocessing, classification, document layout analysis, OCR, and data extraction from structured and unstructured documents.
Key features:

Arabic OCR

ABBYY FineReader Engine got the new neural network technologies for significantly increasing the Arabic recognition accuracy and correcting the misrecognition of European insertions into Arabic text, losing text strings, and reducing the excessive calls.

Japanese OCR: support of single '℃' unicode character

To improve the detection of the symbols in Japanese, the output document is corrected by the OCR technologies, so the single '℃' Unicode character goes out instead of the two separate characters '°'.

PowerPoint export improvements

ABBYY FineReader Engine now has a better conversion for the presentation formats including the enhanced layout preservation and the generating of the correct appearance for the output:

Updated Adobe PDF Library version

Adobe PDF Library version in the product was updated to version 18. This update fixes security issues with Unicode components of APDFL.

Learn more ic-arrow-right

Victoria Dvornikova

New features

OCR technology update

Machine learning barcode recognition technology

Improved table structure analysis

OCR improvements for text near stamps and signatures

NeoML (open-source ML C++ library) usage

Arabic OCR

Japanese OCR: support of single '℃' unicode character

PowerPoint export improvements

Updated Adobe PDF Library version

Was this article helpful?

Recently viewed

ABBYY Cloud OCR SDK OCR Core Update Q3’21

Victoria Dvornikova

New features

OCR technology update

Machine learning barcode recognition technology

Improved table structure analysis

OCR improvements for text near stamps and signatures

NeoML (open-source ML C++ library) usage

Arabic OCR

Japanese OCR: support of single '℃' unicode character

PowerPoint export improvements

Updated Adobe PDF Library version

Was this article helpful?

Related articles

Recently viewed

Related articles