How to get Word Recognition Variants and retrieve the exact confidence calculation?
It is possible to find out all recognition hypotheses for a word or character, in the following way:
- During recognition, use the recognizer parameters\s below:
IRecognizerParams recognizerParams = DocumentProcessingParams.PageProcessingParams.RecognizerParams;
recognizerParams.SaveWordRecognitionVariants = true;
recognizerParams.SaveCharacterRecognitionVariants = true;
- After the recognition, the collection of hypotheses is accessible through the ICharParams.WordRecognitionVariants, ICharParams.CharacterRecognitionVariant properties and theIParagraph.GetWordRecognitionVariants method.
The sample code is available in the Using Voting API article of the Online Developer's Help.
If RecognizerParams.ExactConfidenceCalculation = true is set during the recognition, then the properties Word.WordConfidence and CharParams.CharConfidence could be accessed.
These properties allow for comparing different recognition variants. For example, if there is a "0" for the character variant with a confidence level of 80% and "O" for the same character with a confidence level 10%, this means that FineReader Engine is almost certain that the symbol is "0". On the contrary, if "0" comes with 20% and "O" comes with 30% confidence, this means that FineReader Engine hesitates what would be the correct result.
Note: The main scenario of using word confidence is comparing two different images (such as 2 photos with different camera settings) of the same document. In this case, WordConfidence will show, which photo is better for the OCR. The value of WordConfidence is relative, not absolute. It makes no sense using WordConfidence as an absolute value to compare OCR confidence between different documents. This parameter is only useful to compare OCR quality within the same document.