I am testing ABBYY FineReader PDF as a possible tool to integrate in our TIF to PDF/A (with OCR) archiving workflow.
I am noticing a behaviour that I cannot understand if it is expected or not. The quickest way to reproduce it:
(context: all options are on the default values)
- Open ABBYY FineReader OCR Editor
- File > Open Image...
- Select my TIF scan (600dpi of a printed text document)
- Recognition is excellent, except one word (my surname);
my name end with 'gli' and there is an italian word that is identical, except it ends with 'gii';
the italian word is preferred to my name 100% of the time;
perfectly consistent, on all documents I tested (different fonts, bold, italic, ...) - I select a text area with my name and go to Area > Recognize Area (Ctrl+Shift+B)
- My name is correctly recognized, clearly pulling the term from the user dictionary for Italian, where I previously added it
- I go to Recognize > Recognize Page (Ctrl+R)
- My name is corrected back to the wrong italian word
- I go back to step 5, manually recognize the area again, my name is now correct.
So the question is: why is the user dictionary ignored during the normal recognition process and only used when I manually recognize a single area?
The objective of the archival workflow is to have (within reason, perfection is not expected) confidence that names or specific specialized terms are correctly recognized, pulling them from the user dictionary for the document language. Because those may be exactly the terms used to find the documents later, using search tools.
It is not feasible to open every single document in the OCR Editor to correct and re-save, unless the scanned original paper document is a low-quality or damaged print and needs to be double-checked.
I would think this behaviour is either a bug or something I am doing wrong... thank you for your opinion on the matter ;-)
コメント
1件のコメント
Hi, Roberto!
I've created a support ticket based on the situation you described. A customer support specialist will get back to you shortly.
サインインしてコメントを残してください。