How to combine OCR languages

If you would like to use a few languages for OCR, for example, English and German, then you simply specify these languages list, divided by a comma, in RecognizerParams as follows:

DocumentProcessingParams.PageProcessingParams.RecognizerParams.SetPredefinedTextLanguage("English, German");

However, if you would like to combine your custom text language (for example, a dictionary-based one), with a predefined language, this requires some coding.

Let's say that you are running the code sample "CustomLanguage" and you would like to append German to your custom text language. Here is what you already have in CustomLanguage code sample:

  • You have created your custom TextLanguage using
    textLanguage= LanguageDatabase.CreateTextLanguage()
  • You have copied the predefined text language settings, for example, English, to your language using
    textLanguage.CopyFrom(Engine.PredefinedLanguages.Find( "English" ));
  • You have filled textLanguage.BaseLanguages[0] with your custom dictionary using
    baseLanguage.DictionaryDescriptions.AddNew( DictionaryTypeEnum.DT_UserDictionary );
    and specified the dictionary file.

  • Therefore, you have textLanguage with 1 English-based BaseLanguage. Let's append a new BaseLanguage
    textLanguage.BaseLanguages.AddNew();
  • Copy these BaseLanguage settings from German as follows:
    textLanguage.BaseLanguages[1].CopyFrom(engineLoader.Engine.PredefinedLanguages.Find("German").TextLanguage.BaseLanguages[0]);

You now have both custom dictionary-based language and German language combined for further recognition.

Was this article helpful?

1 out of 2 found this helpful

Have more questions? Submit a request

Recently viewed