The Pages panel in Verification Station has several types of information, including percentage of low-confidence characters, error/warning, etc. I suggest adding another type of information (not as a default, but as a user option): percentage of page covered by text areas or table areas. This would help me to identify pages where Recognition Server has incorrectly identified text areas as picture areas (which happens sometimes with complex layouts, incorrect detection of rotated text, text inside graphical borders, or wavy/curled text lines). I would sort by this and then examine the pages with a relatively low percentage of text or table areas to see if any correction is needed. I don't know how much this would help others, but I would use this on almost every document I OCR.
When I say percentage covered, I mean adding the sizes in square pixels of the text areas and table areas and comparing to the total page size in square pixels.
You could also create a warning for the error/warning section based on this information. For example, pages with text+table coverage less than 1/4 of the average text+table coverage could show a warning like: "Check to make sure text has not been incorrectly identified as pictures."
This would be a great feature to have in FineReader, too.
Thanks!
コメント
1件のコメント
Hello,
Thank you for feedback. We addressed this information to ABBYY HQ
サインインしてコメントを残してください。