Why use FlexiCapture Technology for Data Extraction?

The term “ABBYY FlexiCapture” is used in different contexts:

  • As a name for a product line of professional applications (Standalone, Distributed) to process forms, classify documents and extract data. The structured information then can be used in business processes. Further details on the FlexiCapture products and solutions can be found here: English, German, French.
  • FlexiCapture SDK provides the toolkit for developers including technologies for processing structured and semi-structured documents.
  • FlexiLayouts and FlexiLayout Studio are tools that allow to develop and define document definitions and the extraction logic according to which the data will be found in unstructured or semistructured documents.

Difference between data extraction via full-text OCR with own custom-rules based parsing and usage of the ABBYY FlexiCapture Technology


A) Parsing Fulltext OCR results

Simple data extraction methods that leverage “plain OCR text” files often use regular expressions (Wikipedia). However, the layout of each document contains very valuable additional information (that is not leveraged by usage of OCR + regular expressions). In addition to information about what are logos, image and text areas, addtional information can be useful, such as lines, positon of text, specific keywords and gaps between different elements  - all of this inforrmation is avalable in the structure of the content of each document.
Using ABBYY OCR technology, developers can access the layout structure and the coordinates of certain regions:
  • Text
  • Images
  • Barcodes
  • Tables
The information is accessible via:
  • FineReader Engine layout object - “live” access during document processing
  • the ABBYY XML - “offline” access to parse the information

In addition to the above approach, the ABBYY FlexiCapture Technology offers further advantages:

  • “Simple” full text parsing is not often explicit enough –> FlexiCapture Technology provides more details
  • Writing „hard coded“ extraction logic can be complicated and the written code might be difficult to maintain –> ABBYY FlexiCapture Technology is a productised technology that comes with training & support. In addition, there are many ABBYY certified experts with advanced knowledge of data capture.
  • Dealing with text and layout coordinates in pure code might feel very abstract –> ABBYY FlexiCapture Technology comes with
    • Visual Development Tools
    • Logic to deal with graphical elements
    • Logic to use the relation between objects
    • Automatic Hypothesis Analysis – that also allowes optional elements

The illustration below shows the differences between the document analysis of an invoice (using the  fullt-ext OCR approach) and the same document processed using the FlexiLayout (with access to the indivdual invoice lines).




Further Benefits

FlexiCapture Technology also provides:

  • Built-in support for multi-page documents
  • Detection of complex document structures with repeatable elements - e.g. insurance contracts with a form for each family member, embedded in a standard set of other pages
  • Scripting support to define what information is needed

Have more questions? Submit a request



Please sign in to leave a comment.