Problems with detecting areas and area types

I have two problems with ABBYY's area detection in the OCR Editor (using the latest version of ABBYY FineReader PDF 15).

The first problem is that sometimes on a page with both images and text, image areas extend too far, including pieces of text in their borders and making it so that those words aren't recognized by the OCR. There are too many pages in the files I'm working with to go in and fix this all manually; is there a way to disable image area detection in ABBYY editor and have it only search for text areas?

The second problem is with text area detection itself. Instead of a few large text boxes encompassing the articles and paragraphs on a given page, ABBYY is creating many smaller boxes following seemingly random divisions within the text. Some of these boxes only recognize two or three words each, and there are other sections of text among these smaller areas that go totally undetected.

Not only does this make the page itself harder to work with, but it takes ABBYY more time to analyze, load, and move between pages.

Again, there are too many pages to work with to do this all manually. Templates aren't an option because there is no standard format to structure one around, and converting the pages to black and white had no effect on the way text areas were recognized. Is there any way to improve the way ABBYY detects these text areas?

Any help for either question would be much appreciated!

Community

Was this article helpful?

Comments

Didn't find what you were looking for?