I'm seeing inconsistencies between the layout of an image and the text output from Abbyy.
I'm testing with image of an invoice. It has a tabular layout. My post-processing logic is looking at the whitespace as separators for the data and uses them to extract sections of the data for input into another program.
However, the horizontal whitespace is not always preserved, and Abbyy seems to have it's own idea of how to format it in a tabular way - it seems to recognize that there is a table, but is aligning/grouping some of the wrong data horizontally together.
Here's an image of the before and after: http://screencast.com/t/wYQI8h48I - the output from Abbyy is in the notepad document at the top, the source image is below.
I know that I can export as XML, which will have every character position apparently accurately recorded... but then I need to write a program to recompose the document into a text format... which is what the text output from Abbyy should already be providing!
Another, but not important to me, note is that all of the horizontal lines go missing when processed by abbyy. As you can see from the image, they are made up of hyphens and 'equal signs', which are valid characters. Why are they all stripped?