Community

confidence attribute, processing profile

Written by Permanently deleted user

May 18, 2012 06:55
4

I am processing mobile photos.

I have noticed that the "confidence" attribute is not provided for chars when I use processImage. Is this only provided for processing text fields?

Also, I usually get the best results for processImage with a profile of "documentConversion" -- this usually includes correct text, and skips incorrect text. When I switch to a "textExtraction" profile I expect better text, but instead it just adds a lot of noise. Is this unexpected?

Was this article helpful?

0 out of 0 found this helpful

Comments

4 comments

SDK Support Team

May 18, 2012 08:00
The only format that allows getting confidence information for processImage is xml. So you need to parse xml and there will be "suspicious="1"" attribute for uncertain characters.

E.g.:

<charParams b="64" r="214" t="51" l="205">T</charParams> <charParams b="64" r="229" t="52" l="216" suspicious="1">H</charParams>

The "textExtraction" profile is optimized to extract as much text from document as possible. The text after recognition is intended to be used in search scenarios. E.g. when you need to add some image to full-text search database. After that you can find the document by typing one or more words from it. So it is usual to get more noise because noise is not considered very harmful in this scenario.

The "documentConversion" profile is optimized for text reuse. It allows reconstruction of page layout, formatting and other page elements. That is why it is default processing profile.
0
Permanently deleted user

May 18, 2012 08:12
Thanks for your answer, that is helpful. Regarding confidence, I am wondering about the difference between "suspicious" and "confidence." In your example here you provide confidence as a number between 1 and 100:

http://ocrsdk.com/documentation/quick-start/text-fields/

However, suspicious seems to be 1 or not-present. What is the reason for the difference?

0
SDK Support Team

May 18, 2012 08:20
"Suspicous" is a bit-flag. It is either present or not. If it is present, it means recognition engine is not sure whether the recognition of it was correct.

Confidence is int from 1 to 100. It represents the amount of similarity between recognized character and how recognizer expects it too look.

"Confidence" attribute is quite confusing, we have plans to replace it with "suspicious" in all text-field processing.

0
Permanently deleted user

October 07, 2013 23:58
How feasible is it to annotate PDF output with confidence metrics? For example, by producing both XML and PDF, may one reasonably extract low confidence ranges from XML and figure out where this attribute should be inserted into PDF? Do I assume correctly that XML tells you on just what page text appears (not where on page)...or does layout analysis break down page into text blocks so recognition confidence issues will be associated with a text block? Thanks for any help.

0

Please sign in to leave a comment.

Community

confidence attribute, processing profile

Was this article helpful?

Comments

Didn't find what you were looking for?