The term “ABBYY FlexiCapture” is used in different contexts:
As a name for a product line of professional applications (Standalone, Distributed) to process forms, classify documents and extract data. The structured information then can be used in business processes. Further details on the FlexiCapture products and solutions can be found here: English, German, French.
FlexiCapture SDK provides the toolkit for developers including technologies for processing structured and semi-structured documents.
FlexiLayouts and FlexiLayout Studio are tools that allow to develop and define document definitions and the extraction logic according to which the data will be found in unstructured or semistructured documents.
Difference between data extraction via full-text OCR with own custom-rules based parsing and usage of the ABBYY FlexiCapture Technology
A) Parsing Fulltext OCR results
Simple data extraction methods that leverage “plain OCR text” files often use regular expressions (Wikipedia). However, the layout of each document contains very valuable additional information (that is not leveraged by usage of OCR + regular expressions). In addition to information about what are logos, image and text areas, addtional information can be useful, such as lines, positon of text, specific keywords and gaps between different elements - all of this inforrmation is avalable in the structure of the content of each document.
Using ABBYY OCR technology, developers can access the layout structure and the coordinates of certain regions:
The information is accessible via:
- FineReader Engine layout object - “live” access during document processing
- the ABBYY XML - “offline” access to parse the information
In addition to the above approach, the ABBYY FlexiCapture Technology offers further advantages:
“Simple” full text parsing is not often explicit enough –> FlexiCapture Technology provides more details
Writing „hard coded“ extraction logic can be complicated and the written code might be difficult to maintain –> ABBYY FlexiCapture Technology is a productised technology that comes with training & support. In addition, there are many ABBYY certified experts with advanced knowledge of data capture.
Dealing with text and layout coordinates in pure code might feel very abstract –> ABBYY FlexiCapture Technology comes with
Visual Development Tools
Logic to deal with graphical elements
Logic to use the relation between objects
Automatic Hypothesis Analysis – that also allowes optional elements
The illustration below shows the differences between the document analysis of an invoice (using the fullt-ext OCR approach) and the same document processed using the FlexiLayout (with access to the indivdual invoice lines).
FlexiCapture Technology also provides:
Built-in support for multi-page documents
Detection of complex document structures with repeatable elements - e.g. insurance contracts with a form for each family member, embedded in a standard set of other pages
Scripting support to define what information is needed
Please sign in to leave a comment.