Community

[FineReader Engine 10] How to get rid of extra symbols and characters from output?

Except from Predefined Text Languages to get only digits in final output I am using following code but output contains alphabets and symbols.

    HRESULT res;
    CSafePtr<IBaseLanguage> baseLanguage;
    res = engine->CreateBaseLanguage(&baseLanguage);

    res = baseLanguage->put_LetterSet(BLLS_Alphabet, CBstr(L"0123456789"));

    CSafePtr<ITextLanguage> textLanguage;
    res = engine->CreateTextLanguage(&textLanguage);

    CSafePtr<IBaseLanguages> baseLanguages;
    res = textLanguage->get_BaseLanguages(&baseLanguages);
    res = baseLanguages->Add(baseLanguage);
    res = baseLanguages->Item(0, &baseLanguage);
res = engine->CreatePageProcessingParams(&pageProcessingParams);
        CSafePtr<IRecognizerParams> recognizerParams;
        res = pageProcessingParams->get_RecognizerParams(&recognizerParams);
        res = recognizerParams->get_TextLanguage(&textLanguage);
        frDocument->Process(pageProcessingParams,0,0);
0

Comments

1 comment

  • Avatar
    SDK_support

    English is the default recognition language. If you want to change the default recognition language, you'd better use the SetPredefinedTextLanguage method of the RecognizerParams object.

    In you code snippet you just add your language to a collection of base languages. How to create and set custom language please find in Help → Guided Tour → Advanced Techniques → Working with Languages and also in code samples: CustomLanguage.

    But when you set language which contains only digits and there are some letter on the image, FRE will try to recognize letters as some digits.

    You could select from your document digits in post-processing.

    0
    Comment actions Permalink

Please sign in to leave a comment.