Why use FlexiCapture Technology for Data Extraction?

Question

What are the benefits of using FlexiCapture technologies?

Answer

The term “ABBYY FlexiCapture” is used in different contexts:

As a name for a product line of professional applications (Standalone, Distributed) to process forms, classify documents and extract data. The structured information then can be used in business processes. Further details on the FlexiCapture products and solutions can be found here: English, German, French.
FlexiCapture SDK provides the toolkit for developers including technologies for processing structured and semi-structured documents.
FlexiLayouts and FlexiLayout Studio are tools that allow to develop and define document definitions and the extraction logic according to which the data will be found in unstructured or semistructured documents.

Difference between data extraction via full-text OCR with its own custom-rules-based parsing and usage of the ABBYY FlexiCapture Technology

Parsing Fulltext OCR results

Simple data extraction methods that leverage “plain OCR text” files often use regular expressions (Wikipedia).

However, the layout of each document contains very valuable additional information (that is not leveraged by the usage of OCR + regular expressions). In addition to information about the logos, image, and text areas, other information can be useful, such as lines, position of text, specific keywords, and gaps between different elements; - all of this information is available in the structure of the content of each document. Using ABBYY OCR technology, developers can access the layout structure and the coordinates of certain regions:

Text
Images
Barcodes
Tables

The information is accessible via:

FineReader Engine layout object - “live” access during document processing
the ABBYY XML - “offline” access to parse the information

In addition to the above approach, the ABBYY FlexiCapture Technology offers further advantages:

“Simple” full-text parsing is not often explicit enough
- FlexiCapture Technology provides more details
Writing hard-coded extraction logic can be complicated and the written code might be difficult to maintain.
- ABBYY FlexiCapture Technology is a productized technology that comes with training & support. In addition, there are many ABBYY-certified experts with advanced knowledge of data capture.
Dealing with text and layout coordinates in pure code might feel very abstract
- ABBYY FlexiCapture Technology comes with
  - Visual Development Tools
  - Logic to deal with graphical elements
  - Logic to use the relation between objects
  - Automatic Hypothesis Analysis – that also allows optional elements

The illustration below shows the differences between the document analysis of an invoice (using the full-text OCR approach) and the same document processed using the FlexiLayout (with access to the individual invoice lines).

Further Benefits

FlexiCapture Technology also provides:

Built-in support for multi-page documents
Detection of complex document structures with repeatable elements - e.g. insurance contracts with a form for each family member, embedded in a standard set of other pages
Scripting support to define what information is needed

Nikolai Kromm

Question

Answer

Difference between data extraction via full-text OCR with its own custom-rules-based parsing and usage of the ABBYY FlexiCapture Technology

Further Benefits

Was this article helpful?

Recently viewed