Community

ABBYY Fine Reader - language selection options during OCR

Hello,

I have been using ABBYY Fine Reader for many years and recently I started wondering what the difference is between the following options when selecting a language:

  • Automatically select OCR languages from the following list
  • Specify OCR languages manually

From my own observations, I noticed that the list of languages in the first case is much shorter, but in both cases I can select many languages at the same time. I may suspect that option no. 1 gives better results, but I don't really know what the specific reason for this would be.

Given that we choose the same languages in both options, what exactly is the difference in their use?

Thank you in advance for your answer!

Was this article helpful?

0 out of 0 found this helpful

Comments

2 comments

  • Avatar
    Yuriy Korotkevych

    Hello Kacper,

    You're right, the "Automatically select" option in overall can provide better quality result, with less number of OCR errors. But the main purpose of this function is to free the users who work with documents in multiple languages or multi-language documents from the need to switch document language or languages in FineReader for every single document they work with.

    When specifying a list of multiple document languages using "Specify OCR languages manually" option, we basically force FineReader to try using all selected languages for the document being processed. If all these languages occur in the document, then all is good. But if some of them don't, FineReader will anyway try applying all of them to read the document, which can lead to less accurate recognition. So for the best results, it's better when to read a document FineReader applies information (character set, dictionaries, formats, etc.) only about the languages that are used in that specific document. 

    And this is exactly what is achieved by using "Automatically select" option while freeing users from the need to select specific languages every single time. When automatic selection of languages is used, FineReader first detects which of the languages from the list are used in the document that the user is working with at the moment, and only then goes into OCR'ing and converting the document in full, applying not all the languages from the list, but only those which have been detected in the document. So by using "Automatically select" option we "tell" FineReader "I may work with documents that can be in one of the listed languages or in a mix of any of them - but you should find out which one(s) to use for every single document yourself." And FineReader does so.

    Answering specifically to your question "Given that we choose the same languages in both options, what exactly is the difference in their use?":

    • if in the specific document you're working on right now all of these languages are used, then there shall be no difference;
    • if in the specific document you're working on right now just one or some of these languages are used, then using "Specify manually" may result in worse recognition quality. The more of the languages specified manually are missing in the actual document, the more mistakes may happen. 
    1
  • Avatar
    Kacper Ribszleger

    Hey Yuriy,

    This is very insightful, thank you!

    Although, I have one follow-up question regarding this part "FineReader will anyway try applying all of them (languages) to read the document" - what does it exactly mean? In the end, there can't be multiple languages applied, but only a single one. Does ABBYY recognize language on a character, word, or a sentence level in that scenario? Maybe it would be good to show this on some example? I think the following scenario could be a good example:

    • Scanned document
    • "Specify OCR languages manually" enabled with languages English and Russian 
    • The whole document in English

    What could be some potential errors in such a case compared to auto-select and why they would occur?

    I would like to better understand what is the difference in ABBYY processing file using these different manual and auto language selection settings while breaking it down into prime factors.

    0

Please sign in to leave a comment.