OCR skipping lines of text that can clearly be read. Answered

I'm running the Abbyy OCR engine in C# to process receipts, and it's been skipping large portions of the receipts that are clearly readable. I've created user trained patterns for receipts that have this problem, but this only resulted in 100% accuracy for the portions it processed. The lines of text it is skipping remained skipped after training.

I'm using the DocumentConvertion_Accuracy settings like this.

        FRDocument document = _engine.CreateFRDocument();
        // load some images here...
        const string patternFile = "C:\\abbyy_pattern.ptn";
        DocumentProcessingParams docParams = _engine.CreateDocumentProcessingParams();
        docParams.PageProcessingParams.RecognizerParams.UserPatternsFile = patternFile;

Here is a sample image illustrating the problem. The colored rectangles represent text Abbyy successfully processed. As you can see there is a large portion of the receipt missing OCR data.

As a test, I've modified the scan receipt in Photoshop to increase the line spacing in the problem area. After adding extra space between the lines Abbyy started generating OCR data for that area.

Obviously, I can't Photoshop every scan that has this problem and this problem is presenting itself in a significant percentage of our scans.

I have read over the documentation multiple times and can not find any settings that would indicate a solution to this problem.

Can anyone assist me?


Was this article helpful?

0 out of 0 found this helpful


1 comment

Please sign in to leave a comment.