Community

Outputting selected columns to a CSV file

Hi - we have a 10-column PDF document and wish to extract only selected columns and re-order them.
We can get most of this right (selecting the columns, re-ordering them etc) and the preview looks right, except the export of the final format.
I use the (Editable Copy) setting.
The output works fine for WORD where the selected columns are next to each other and properly formatted.
ColumnA  ColumnB  ColumnC etc

It does not work well for EXCEL or CSV where each column ends up below the previous column instead of next to it.
ColumnA 
ColumnB
ColumnC etc

We are working on improving workflow in goods receiving and need to match various suppliers' incoming documents to a standard for importing to our systems and CSV file format is the preferred import format.
Any suggestions will be much appreciated!

Thanks in advance ;-)

Was this article helpful?

0 out of 0 found this helpful

Comments

1 comment

  • Avatar
    Yuriy Korotkevych

    Hi Colin,

    In general, it looks like your task is beyond the capabilities of the desktop FineReader PDF. I'd recommend you to take a look at our IDP automation platform, specifically ABBYY Vantage, for which document processing and data extraction skills for various types of documents are available. If you get interested to learn more and discuss the details of a solution for your task with our Vantage experts, please fill in the request form on the page and they'll contact you back.

    As to the problem you're having when trying to use FineReader PDF for the task. Editable Copy mode cannot be used for saving to Excel or CSV. When you switch to these formats, the mode changes to Formatted Text, even if you have previously set it to Editable Copy when saving to Word, for example. If my assumption of what you're trying to do is correct, the following happens. When you draw separate table areas for selected columns of a table only (in an attempt to exclude other, unnecessary columns from the output), FineReader understands that as separate tables and therefore places them one under another in the "Formatted Text" type output that it can use for Excel or CSV. There's no way to tell FineReader that those several table areas are the parts of one table.

    To get rid of that unnecessary data in the output, you can try deleting them in the Text window (the right window in the OCR Editor) before saving the result, but it will anyway keep the empty "placeholders" for this data in the output xlsx or csv. As I mentioned above, FineReader PDF isn't designed for automated data extraction tasks.

    0

Please sign in to leave a comment.