コミュニティ

OCR Solution Investigation

I 'm a newbie and investigating a solution that matches 2 requirements:

  • Does FlexiCapture can automatically detect a specifical table (occupied one page) in a PDF file with multiple pages then extract data from the table and ignore the left pages to speed up OCR progress. 
  • There are more than 10 thousand types of PDF File, you know if I create a document definition for each type,it is a formidable task.So I wonder if FlexiLayout or some other tool can automatically recognize some related fields like name, address, number,  line items and so on to reduce the task and I just need to incrementally add a new field in the template once there is a new type which is out of the above scope.

Best regards.

この記事は役に立ちましたか?

0人中0人がこの記事が役に立ったと言っています

コメント

2件のコメント

  • Avatar
    Permanently deleted user

    Hello,

    You may search for the specific table on the level of "flexi layout" using unique identifiers, then consider all pages that will not have these identifiers as non-recognizable annexes.

    As to the auto-classification "out-of-b o x" and extraction of some standard fields - this functionality is available for the Invoice type of documents, see the FlexiCapture for Invoices description, otherwise you'll need to prepare your own layout.

    0
  • Avatar
    Permanently deleted user

    Dear Ekaterina

    Thank you. I will try to use identifiers firstly.

    0

サインインしてコメントを残してください。