Word Recognition Variants and exact confidence calculation

If you wish to find out all recognition hypotheses for a word or character, do the following:

1. During recognition, use the recognizer params below:

IRecognizerParams recognizerParams = DocumentProcessingParams.PageProcessingParams.RecognizerParams;
recognizerParams.SaveWordRecognitionVariants = true;
recognizerParams.SaveCharacterRecognitionVariants = true;

 

2. After the recognition, the collection of hypotheses is accessible after recognition through the ICharParams.WordRecognitionVariantsICharParams.CharacterRecognitionVariant properties and the IParagraph.GetWordRecognitionVariants method. 

 

The sample code is in "Help" → "Guided Tour" → "Advanced Techniques" → "Using Voting API".

If you set RecognizerParams.ExactConfidenceCalculation = true during the recognition, then you will be able to access properties Word.WordConfidence and CharParams.CharConfidence properties. These properties allow comparing different recognition variants. For example, if you have "0" for the character variant with confidence level 80% and "O" for the same character with confidence level 10%, this means that FineReader Engine is almost certain that the symbol is "0". On the contrary, if "0" comes with 20% and "O" comes with 30% confidence, this means that FineReader Engine hesitates what would be the correct result. 

Note: The main scenario of using word confidence is comparing two different images (such as 2 photos with different camera settings) of the same document. In this case, WordConfidence will show you, which photo is better for the OCR. The value of WordConfidence is relative, not absolute. It makes no sense using WordConfidence as an absolute value to compare OCR confidence between different documents. This parameter is only useful to compare OCR quality within the same document.

Was this article helpful?

1 out of 4 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.