OCR Voting API

Developers can combine several Engines in their recognition solutions. The term "voting" in OCR is used when developers combine multiple OCR engines in their solutions - when such OCR engines generate different recognition variants for a character or word, the developer can select the best variant by "voting" between the variants. voting between the variants.
Since many years, ABBYY FineReader Engine offers a special Voting API which provides access to different hypotheses for character or word recognition together with corresponding weight values.
Developers can use the FineReader Engine Voting API to check recognition results using their own databases and algorithms, and to correct the text results.
For example, the developer can build words from letters or check all generated hypotheses.

ABBYY FineReader Engine provides two options:

  • WordRecognitionVariant
    This object represents a single hypothesis for a word and contains the text of the hypothesis, type of model, the average width of stroke, and information on whether the hypothesis has been found in the dictionary.
  • CharacterRecognitionVariant
    This object represents a single hypothesis for a character and contains character confidence, probability that a character is written with a serif font, and information on whether the character is superscript or subscript.

Example of Character Recognition

During the layout analysis step, the text areas, lines and single characters coordinates are detected. After the character separation, each character is recognized with different text recognition technologies/algorithms/classifiers.

The recognition confidence of a single character image is a numerical estimate of the probability that the image does in fact represent this character.

For example, an image of the letter “e” may be recognized as

  • the letter “e” with a confidence of 95
  • the letter “c” with a confidence of 85,
  • the letter “o” with a confidence of 65, etc.

The hypothesis with the highest confidence rating is selected as the recognition result. However, the selection also depends on the context (i.e. the word in which the character occurs) and the results of a differential comparison.

If the word with the “e” hypothesis is not a dictionary word while the word with the “c” hypothesis is a dictionary word, the latter will be selected as the recognition result, even though its confidence rating will still be 85. The remining the recognition variants can be obtained as hypotheses.

Important Note: The Voting API is only available for OCR, not for for recognition of hand-printed texts.

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.

Recently viewed