How to check the detected languages in the processed document

Question

How to use the language auto-detection for each word and to check which languages are used / detected after the recognition process?

Answer

LanguageDetectionMode property of the RecognizerParams Object manages automatic language detection. When language autodetection is on, the recognition language is detected for each word in the text. It is selected from the list of languages specified in the TextLanguage property. Autodetection is intended to be used during recognition of documents, the language of which is not known to you. You can read more information on it in FineReader Engine 12 Developer's Guide.

In order for it to work, please specify in TextLanguage property list of languages from the predefined languages, that may occur in the document.

You can view the list of languages detected in the recognized document or recognized page using the DetectedLanguages property of the FRDocument or FRPage object. Here's a code snippet for using DetectedLanguages in  С#:

document.AddImageFile(imagePath, null, null);
RecognizerParams RecParams = engineLoader.Engine.CreateRecognizerParams();
DocumentProcessingParams DocParams = engineLoader.Engine.CreateDocumentProcessingParams();
PageProcessingParams PageParams = engineLoader.Engine.CreatePageProcessingParams();
RecParams.SetPredefinedTextLanguage("English, Latvian, Irish, Arabic");
RecParams.PutLanguageDetectionMode(ThreeStatePropertyValueEnum.TSPV_Yes);
PageParams.PutRecognizerParams(RecParams);
DocParams.PutPageProcessingParams(PageParams);
Console.WriteLine("Recognizing...");
document.Process(DocParams);

var langString = $"Recognized languages: {document.DetectedLanguages.Count} - ";
foreach (IDetectedLanguage lang in document.DetectedLanguages)
{
langString += $"{lang.InternalName}, ";
}
Console.WriteLine(langString);

 

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.