Question
How to use the language auto-detection for each word and to check which languages are used / detected after the recognition process?
Answer
LanguageDetectionMode property of the RecognizerParams Object manages automatic language detection. When language autodetection is on, the recognition language is detected for each word in the text. It is selected from the list of languages specified in the TextLanguage property. Autodetection is intended to be used during recognition of documents, the language of which is not known to you. You can read more information on it in FineReader Engine 12 Developer's Guide.
In order for it to work, please specify in TextLanguage property list of languages from the predefined languages, that may occur in the document.
You can view the list of languages detected in the recognized document or recognized page using the DetectedLanguages property of the FRDocument or FRPage object. Here's a code snippet for using DetectedLanguages in С#:
document.AddImageFile(imagePath, null, null);
RecognizerParams RecParams = engineLoader.Engine.CreateRecognizerParams();
DocumentProcessingParams DocParams = engineLoader.Engine.CreateDocumentProcessingParams();
PageProcessingParams PageParams = engineLoader.Engine.CreatePageProcessingParams();
RecParams.SetPredefinedTextLanguage("English, Latvian, Irish, Arabic");
RecParams.PutLanguageDetectionMode(ThreeStatePropertyValueEnum.TSPV_Yes);
PageParams.PutRecognizerParams(RecParams);
DocParams.PutPageProcessingParams(PageParams);
Console.WriteLine("Recognizing...");
document.Process(DocParams);
var langString = $"Recognized languages: {document.DetectedLanguages.Count} - ";
foreach (IDetectedLanguage lang in document.DetectedLanguages)
{
langString += $"{lang.InternalName}, ";
}
Console.WriteLine(langString);
Comments
3 comments
Peter Kirchgessner
Thank you for the code snippet. From the user guide of FRE 12.4.7.63 it was not clear, that TSPV_Yes must be used to use auto detection. And not TSPV_Auto (the default).
Radityo Pratomo
Hey Sergey, is it possible to get the detected language in ABBYY Vantage? Would you mind sharing the wisdom?
Sergey Pilipchuk
Hi Radityo,
You can use the OCR skill JSON export to get a list of detected languages.
Please sign in to leave a comment.