I 'm a newbie and investigating a solution that matches 2 requirements:
- Does FlexiCapture can automatically detect a specifical table (occupied one page) in a PDF file with multiple pages then extract data from the table and ignore the left pages to speed up OCR progress.
- There are more than 10 thousand types of PDF File, you know if I create a document definition for each type,it is a formidable task.So I wonder if FlexiLayout or some other tool can automatically recognize some related fields like name, address, number, line items and so on to reduce the task and I just need to incrementally add a new field in the template once there is a new type which is out of the above scope.
Best regards.
Comments
2 comments
Hello,
You may search for the specific table on the level of "flexi layout" using unique identifiers, then consider all pages that will not have these identifiers as non-recognizable annexes.
As to the auto-classification "out-of-b o x" and extraction of some standard fields - this functionality is available for the Invoice type of documents, see the FlexiCapture for Invoices description, otherwise you'll need to prepare your own layout.
Dear Ekaterina
Thank you. I will try to use identifiers firstly.
Please sign in to leave a comment.