Image Resolution Detection for OCR

When a document is scanned, then the scanning resolution is set in DPI. When a document is digitized with a digital camera, there is no resolution information, because it is a function/result of object distance, focal length of the lens, sensor size and sensor resolution.

For the OCR processes, ABBYY technologies are able to calculate the DPI - even for images that were created by digital cameras

How ABBYY technologies calculate the DPI

If the images come from a digital camera, they are supplied with meta data in the Exchangeable Image File Format (EXIF). This contains things like focal length, aperture, orientation (rotation).

While in theory the result can be calculated,in reality some older imaging software products may not properly handle the information.
In case an image header contains incorrect resolution data, ABBYY Technologies detext the resolution detection:

In case an image header has incorrect resolution data and user didn't set a value of resolution for overwriting the algorithm guessing reasonable resolution value:

  • EXIF is present. Resolution is calculated by formula:
    max(96, RoundTo(max(width/8, height/12),10)).
  • EXIF is absent and an image is anisotropic:
    • One of dimensions is 50dpi or less – resolution is calculated by formula from p.1;
    • Degree of anisotropy is below 10% – resolution is equate with the biggest one;
    • Degree of anisotropy is above 10% – an image is resampled up to the biggest resolution.
  • EXIF is absent and an image is isotropic:
    • Resolution is below 50 dpi – resolution is calculated by formula from p.1;
    • Resolution is below 140 dpi and formula from p.1 gives bigger one – the image resolution is left intake but internally Engine works with resolution calculated by the formula from p.1;
    • Resolution is above 140 dpi – resolution is equate with it.

Have more questions? Submit a request



Please sign in to leave a comment.