Consolidate Data Extracted from Multiple Document Types into One Output Answered

Hello, am trying FlexiCapture11 and looking for how one can implement customized logic to consolidate data from multiple document types and export the data into a single output?

For example (see attached image), we would like to turn a PDF document pouch file which contains images of three document types: House Billing of Lading, Invoice, and Packing List into a single CSV output file which contains data extracted from the three document types.

Appreciate if someone can shed some lights on how one can do it with FlexiCapture or what functions I should look into. Thanks in advance.




  • Avatar
    Adrian Enders

    Hi Alex. I thought this was an interesting question. Where is the data for the documents exported to? Without knowing more details, why would you want to do this in FlexiCapture? The output you are desiring could be constructed with database queries easily after the document index values have been exported. Just export the data from each document to a set of SQL database tables, then create a VIEW joining the information needed for the CSV file.

    What is a "PDF ... pouch file"?  I am not familiar with that term? Is it just a single PDF document containing these document types? Is the PDF document the input file that FlexiCapture is processing? Are you planning on exporting these 3 document types separately as images? Or do you not care to export the document images, you just want the data?

    Even if you don't have a database and you are exporting the data to 3 different text files, I would just use Excel to create the CSV file by importing the files and performing queries on the data. There is an example here.

    If you absolutely have to do this in FlexiCapture, and all documents are in one PDF file, then just treat all 3 documents as 1 document type. Only create 1 document definition with all the fields. During configuration of export then you just select the fields you want to export. See my previous question about what you are doing with the document images.

    If the above doesn't work and you are separating these documents and exporting the images, and assuming you don't have a database to export to, then you will want to create a custom export step in the workflow. This will require some coding. You will need to create a batch level script that iterates through all of the documents to collect the data you need, and export to a CSV file. There is an example on the FC knowledge base web site here on how to set up custom export scripts. 

    I would try and not do this in FlexiCapture if at all possible, especially if you are just looking for the data. In my opinion the Capture process should always just focus on what it does best; classification, separation and extracting data. Anything else around working with the exported images and data should be performed after released from FC. HOpe this helps.

  • Avatar
    Alex Cheng

    Hi Adrian, thanks for your advices and they are useful!

    Please let me clarify the requirements as belows:

    Input: A single PDF document which contains scanned images of the three document types.

    Output: A single CSV file where the data is extracted and consolidated as described above. This CSV file will then be uploaded to some other in-house applications to reduce manual input. No images is needed.

    At first, I have no idea can/how FC support on these kind of data transformation operations but now do have a better understandings on the possible options. Agree to let FC focus on what it does best and will try to explore the "export the data from each document to a set of DB databases" first.





Please sign in to leave a comment.