Field identification

I was thinking of using the service for generating records from received invoices, which have the same information in multiple different possible formats, including freetext documents written by consultants. As they share the fields required by the law, I was thinking of using regExp queries, but failed noticing that field extraction works well only when field position is accurately specified.

Are you going to work in the direction of identifying fields by the expected characteristics such as nearby titles or expected type and dimension, or should I think of mapping the entire page to a database o words and their positions and manage the search myself?


Was this article helpful?

0 out of 0 found this helpful



  • Avatar


    Current Cloud OCR SDK API is about text recognition only, would it be full text or just a field. It deals nothing with finding a zone on an image.

    We have data capture SDK (ABBYY FlexiCapture Engine) which has required ability. It is not mapped to the Cloud yet, but we are thinking about that. That will take certain time.

    Right now I see two possible ways of doing what you want:

    1. Do full-text recognition and then apply regExp search to recognized text.
    2. If data you work with is structured or semi-structured you can pre-sort it and then apply known layouts of fields. Pre-sorting may be done using full-text recognition and applying key-word search. To save time and efforts only part of a document could be OCRed (first page of a multi-page document or a zone of single-page document).

    Best regards, Dmitry. ABBYY, Lead Product Analyst, SDK products.

  • Avatar
    Andrey Isaev

    Actually, ABBYY is long time working in that direction. We have product called FlexiCapture and SDK called FlexiCapture Engine They all salve taks you have just described - they can help extracting particular data from semi-structured documents. Using FlexiLayout Studio you can define fields you want to extract and rules how to locate them on image. It is not just regular expression, it can define complicate dependencies with voting amond different layout hypotises, and even fields cross-checking and database look-ups for values.

    Unfortunately this is not yet available in the Cloud since it does require special training on FlexiLayout programming.

    So just please contact nearest ABBYY representative to talk about FlexiCapture product or Engine.


Please sign in to leave a comment.