Community

SetPredefinedTextLanguage to Spansih and boost performance Answered

Hi, I have a code to perform a PDF to RTF conversion.

This is a simple snippet of what I do in order to change the idiom

        LoadFREngine();

    CSafePtr<IDocumentProcessingParams> documentProcessingParams;
        CheckResult( FREngine->CreateDocumentProcessingParams( &documentProcessingParams ) );
        CSafePtr<IPageProcessingParams> pageProcessingParams;
        CheckResult( documentProcessingParams->get_PageProcessingParams( &pageProcessingParams ) );
        CSafePtr<IRecognizerParams> recognizerParams;
        CheckResult( pageProcessingParams->get_RecognizerParams( &recognizerParams ) );
    CheckResult( recognizerParams->SetPredefinedTextLanguage( L"Spanish" ) );

but when I run the makefile, I get the following warning

PDF.cpp:36:74: warning: ISO C++ forbids converting a string constant to ‘BSTR {aka wchar_t*}’ [-Wwrite-strings]
     CheckResult( recognizerParams->SetPredefinedTextLanguage( L"Spanish" ) );


Is this a normal behaviour? It doesn't seem to work, because it didn't recognize the accent marks.

Also, some of the documents are not completely OCR'ed. Some of them are only converted to one paragraph, because it detects some text as an image. Is there any way to solve this?

 

Thanks.

Was this article helpful?

0 out of 0 found this helpful

Comments

3 comments

  • Avatar
    Diana Khammatova

    Hi! Your code looks correct. After you’ve set the language, you should pass the documentProcessingParams to the method Process: CheckResult( frDocument->Process( documentProcessingParams ) ); Could you please check whether documentProcessingParams object was passed to the Process method?

    As for the second question could you please provide us with the following additional information:

    1. The build number of FineReader Engine
    2. All recognition and export settings that you use
    3. Could you please describe your recognition scenario in details? What information should be extracted from the document?
    4. Please attach your input and output documents
    1
  • Avatar
    Omar López Rubio

    I was passing the wrong parameter to Process! You are absolutely right! Thank!

    1. build 11.1.14.686141

    2. Im using the following code

    void performOCR(std::string fp) {

      try {
        clock_t t;
        t = clock();

        std::wstring file_wstring = std::wstring(fp.begin(), fp.end());
        const wchar_t* file_path = file_wstring.c_str();
        // Load ABBYY FineReader Engine
            std::cout << "Initializing Engine for file " << fp << std::endl;
            LoadFREngine();

        CSafePtr<IDocumentProcessingParams> documentProcessingParams;
            CheckResult( FREngine->CreateDocumentProcessingParams( &documentProcessingParams ) );
            CSafePtr<IPageProcessingParams> pageProcessingParams;
            CheckResult( documentProcessingParams->get_PageProcessingParams( &pageProcessingParams ) );
            CSafePtr<IRecognizerParams> recognizerParams;
            CheckResult( pageProcessingParams->get_RecognizerParams( &recognizerParams ) );
        CheckResult( recognizerParams->SetPredefinedTextLanguage( L"Spanish" ) );
        //recognizerParams->SetPredefinedTextLanguage(L"English,Spanish,German");

        // Create document from image file
          std::cout << "Loading PDF..." << std::endl;
          CBstr imagePath = Concatenate(L"./data/",file_path);
          CSafePtr<IFRDocument> frDocument = 0;

          CheckResult( FREngine->CreateFRDocumentFromImage( imagePath, 0, frDocument.GetBuffer() ) );
        CheckResult( frDocument->Process( documentProcessingParams ) );

          //Recognize document
           std::cout << "Recognizing..." << std::endl;
          CheckResult( frDocument->Process());

          // Save results
        fp += ".rtf";
        std::wstring output_wstring = std::wstring(fp.begin(), fp.end());
        const wchar_t* output_wchar = output_wstring.c_str();
        std::cout << "Saving Results..." << std::endl;
          CBstr exportPath = Concatenate(L"./output/",output_wchar);
          CheckResult( frDocument->Export(exportPath, FEF_RTF, 0  ) );

        // Unload ABBYY FineReader Engine
        std::cout << "Deinitializing Engine..." << std::endl;
        UnloadFREngine();

        t = clock() - t;
        std::cout << "Time processing: " << float(t)/CLOCKS_PER_SEC << " seconds" << std::endl;
        std::cout << std::endl;
        }
      catch( CAbbyyException& e ) {
        wprintf(e.Description());
      }
    }

    I'm using legal documents as input, with logos, stamps and noise. I can't upload any of them because they're confidential.

    0
  • Avatar
    Diana Khammatova

    You can send this information to SDK_Support@abbyy.com. In addition, the following articles may help to improve the quality of recognition:

    • Guided Tour→Best Practices→Source Image Recommendations→Tips for Document Scanning,
    • Guided Tour→Best Practices→Source Image Recommendations→Tips for Taking Photos,
    • Guided Tour→Advanced Techniques→Working with Images,
    • Guided Tour→Best Practices→ Improving Recognition Quality
    0

Please sign in to leave a comment.