Question
How to use the language auto-detection for each word and to check which languages are used / detected after the recognition process?
Answer
LanguageDetectionMode property of the RecognizerParams Object manages automatic language detection. When language autodetection is on, the recognition language is detected for each word in the text. It is selected from the list of languages specified in the TextLanguage property. Autodetection is intended to be used during recognition of documents, the language of which is not known to you. You can read more information on it in FineReader Engine 12 Developer's Guide.
In order for it to work, please specify in TextLanguage property list of languages from the predefined languages, that may occur in the document.
You can view the list of languages detected in the recognized document or recognized page using the DetectedLanguages property of the FRDocument or FRPage object. Here's a code snippet for using DetectedLanguages in С#:
document.AddImageFile(imagePath, null, null);
RecognizerParams RecParams = engineLoader.Engine.CreateRecognizerParams();
DocumentProcessingParams DocParams = engineLoader.Engine.CreateDocumentProcessingParams();
PageProcessingParams PageParams = engineLoader.Engine.CreatePageProcessingParams();
RecParams.SetPredefinedTextLanguage("English, Latvian, Irish, Arabic");
RecParams.PutLanguageDetectionMode(ThreeStatePropertyValueEnum.TSPV_Yes);
PageParams.PutRecognizerParams(RecParams);
DocParams.PutPageProcessingParams(PageParams);
Console.WriteLine("Recognizing...");
document.Process(DocParams);
var langString = $"Recognized languages: {document.DetectedLanguages.Count} - ";
foreach (IDetectedLanguage lang in document.DetectedLanguages)
{
langString += $"{lang.InternalName}, ";
}
Console.WriteLine(langString);
Comments
0 comments
Please sign in to leave a comment.