Binarization Enhancements in ABBYY Technologies

Adaptive binarisation is an innovative approach within ABBYY's image pre-processing algorithms. The technologies were optimized to increase the quality of source images for the character recognizers. At the same time, the optimized binarisation approach allows to persevere more text on images with degrading qualist as well as to remove “noise” that is caused by 'shine trough' text from the back side of the document page.

Optimization of the binarization algorithms and its impact on the ammount of 'rescued' text.

The images below show the difference - with the optimized binarization process, significantly higher portion of text could be extracted from a page scanned with a negative impact of light. 'Standard binarization' would allow only to retreive text from areas not impacted by the light.

fre10_new_binarisation01.png

 

The images below show the difference when processing a low-quality image with white text on a black page. A standard binarization approach would only allow to extact text from a very small area of the image, while the optimized binarization algorithms allow to extract the text completely.

 

fre10_new_binarisation02.png

The images below show the difference when processing a newspaper pages (which are very thin) with text on the back page shining through the page. With the new binarization approach the shine-thourgh text can be ignored during the recognition process. This will lead to higher recognition results.

  • Bookscanners often use stron light. After standard binarization, this can result in “ghost text-lines” without any useful information shining though.
  • The ABBYY's improved binarization technology is able to detect that and remove the “garbage” before applying text recognition

fre10_new_binarisation03.png

The optimized binarization was introduced in version 10 of FineReader Engine (10/2010) and version 10 of FlexiCapture Engine (10/2012).

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.