How to check the detected languages in the processed document

Question

How to use the language auto-detection for each word and to check which languages are used / detected after the recognition process?

Answer

LanguageDetectionMode property of the RecognizerParams Object manages automatic language detection. When language autodetection is on, the recognition language is detected for each word in the text. It is selected from the list of languages specified in the TextLanguage property. Autodetection is intended to be used during recognition of documents, the language of which is not known to you. You can read more information on it in FineReader Engine 12 Developer's Guide.

In order for it to work, please specify in TextLanguage property list of languages from the predefined languages, that may occur in the document.

You can view the list of languages detected in the recognized document or recognized page using the DetectedLanguages property of the FRDocument or FRPage object. Here's a code snippet for using DetectedLanguages in  С#:

document.AddImageFile(imagePath, null, null);
RecognizerParams RecParams = engineLoader.Engine.CreateRecognizerParams();
DocumentProcessingParams DocParams = engineLoader.Engine.CreateDocumentProcessingParams();
PageProcessingParams PageParams = engineLoader.Engine.CreatePageProcessingParams();
RecParams.SetPredefinedTextLanguage("English, Latvian, Irish, Arabic");
RecParams.PutLanguageDetectionMode(ThreeStatePropertyValueEnum.TSPV_Yes);
PageParams.PutRecognizerParams(RecParams);
DocParams.PutPageProcessingParams(PageParams);
Console.WriteLine("Recognizing...");
document.Process(DocParams);

var langString = $"Recognized languages: {document.DetectedLanguages.Count} - ";
foreach (IDetectedLanguage lang in document.DetectedLanguages)
{
langString += $"{lang.InternalName}, ";
}
Console.WriteLine(langString);

 

 

Have more questions? Submit a request

Comments

3 comments

  • Avatar

    Peter Kirchgessner

    Thank you for the code snippet. From the user guide of FRE 12.4.7.63 it was not clear, that TSPV_Yes must be used to use auto detection. And not TSPV_Auto (the default).

    0
  • Avatar

    Radityo Pratomo

    Hey Sergey, is it possible to get the detected language in ABBYY Vantage? Would you mind sharing the wisdom?

    0
  • Avatar

    Sergey Pilipchuk

    Hi Radityo,

    You can use the OCR skill JSON export to get a list of detected languages.

    0

Please sign in to leave a comment.

Recently viewed