Community

EnableTextExtractionMode causes problems with number recognition

I'm extracting text from an image, and to extract as much text as possible from the image (a pdf document) i enable the EnableTextExtractionMode flag.

My problem is illustrated in this image (I've added the red marks to illustrate the problems). Commas are often confused with dots, colons or semicolons:


I need the EnableTextExtractionMode enabled to get all the text of the image, without it, text is sometimes confused for pictures. I can disable EnableTextExtractionMode and then also disable DetectPictures, but it is not quite as good at getting all the text as it is when EnableTextExtractionMode is enabled.

The problem with EnableTextExtractionMode is that it also enables the ProhibitModelAnalysis flag, and as far as i can tell that is what is causing my problems.

Just to make sure that it was the ProhibitModelAnalysis flag that was causing the problems, I tried to run the image through with EnableTextExtractionMode = false and ProhibitModelAnalysis = true, and that did indeed cause the same problem.

So my question is this: Is there anyway to get the additional text extraction provided by EnableTextExtractionMode without the reduced recognition quality from ProhibitModelAnalysis?

Was this article helpful?

0 out of 0 found this helpful

Comments

2 comments

  • Avatar
    Permanently deleted user

    Would you be so kind to send your original image for our tests to SDK_Support@abbyy.com?

    Also please specify the build number of your FineReader Engine distribution.

    0
  • Avatar
    Permanently deleted user

    Thank you for your reply.

    I have sent two pdf-files to SDK_Support@abbyy.com under the subject "EnableTextExtractionMode causes problems with number recognition". One pdf to illustrate the problem with pictures, and one to illustrate the problem with the numbers. I hope that makes sense.

    I'm using Fine Reader version 11.1.9.75

    0

Please sign in to leave a comment.