Community

How to specify DocumentProcessingParams when partially OCR Answered

Hello!

Is there a way to specify "DocumentProcessingParams" when only partial OCR?

The entire document was OCRed using "document.Process ();"

The purpose is to reduce the time by partially performing OCR because OCR requires a long time for the whole.

 

Code:


// Engine Load

document = engine.CreateFRDocument();
document.AddImageFile(strImagePath, null, null);
document.Pages[0].Layout.Clean();
document.Pages[0].Layout.Blocks.DeleteAll();

// Rect
IRegion region = engine.CreateRegion();
region.AddRect(0, 0, 300, 300);
IBlock newBlock = document.Pages[0].Layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region);
ITextBlock textBlock = newBlock.GetAsTextBlock();
textBlock.RecognizerParams.TextTypes = (int)FREngine.TextTypeEnum.TT_Normal;
textBlock.RecognizerParams.SetPredefinedTextLanguage("JapaneseModern");
//DocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();
//dpp.PageProcessingParams.ObjectsExtractionParams.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;
//document.Pages[0].RecognizeBlocks(null, null, dpp.PageProcessingParams.ObjectsExtractionParams);
//document.Pages[0].Synthesize();

// DocumentProcessingParams
DocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();
dpp.PageProcessingParams.PageAnalysisParams.DetectBarcodes = true;
dpp.PageProcessingParams.PageAnalysisParams.AggressiveTableDetection = true;
dpp.PageProcessingParams.PagePreprocessingParams.CorrectOrientation = true;
dpp.PageProcessingParams.PagePreprocessingParams.OrientationDetectionParams.OrientationDetectionMode = OrientationDetectionModeEnum.ODM_Normal;

// Recognition
document.Process(dpp);

// Engine Unload

Was this article helpful?

0 out of 0 found this helpful

Comments

4 comments

  • Avatar
    Aleksandra Zendrikova

    Hi,

    Document processing in ABBYY FineReader Engine consists of several steps: page preprocessing, analysis, recognition, page synthesis, document synthesis, and export. And the Process method performs all steps of processing except for export for the whole document.

    To answer your question, there are several ways to split document processing.
     
    Firstly, you can perform only necessary steps for your document using PreprocessAnalyzeRecognize and Synthesize methods of FRDocument object.

    Secondly, you can use PreprocessPagesAnalyzePagesRecognizePages and SynthesizePages methods of FRDocument object for specific pages.

    And finally, it is possible to work with pages directly via FRPage object and define parameters for an exact page. Pay attention to the PreprocessAnalyzeRecognize method of the FRPage object, that performs all steps of processing except for export for the page.


    You can find more additional information about tuning all the necessary processing parameters in Developer’s Help → Guided Tour → Advanced Techniques → Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis.

    1
  • Avatar
    OHTSUKA Takeshi

    Hi.

     Thank you for answering.
    Calling the "AnalyzePages" method to use "PageAnalysisParams" recognized the entire page.
    How can I change the code to recognize only a designated area?

    Code:

    
    // PagePreprocessingParams
    PagePreprocessingParams ppp = engine.CreatePagePreprocessingParams();
    ppp.CorrectOrientation = m_bUseOrientationDetectionMode;
    ppp.OrientationDetectionParams.OrientationDetectionMode = OrientationDetectionModeEnum.ODM_Normal;
    
    // ObjectsExtractionParams
    ObjectsExtractionParams oep = engine.CreateObjectsExtractionParams();
    oep.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;
    
    // RecognizerParams
    RecognizerParams rp = engine.CreateRecognizerParams();
    rp.SetPredefinedTextLanguage(m_strLanguage);
    rp.SaveCharacterRecognitionVariants = true;
    rp.SaveWordRecognitionVariants = true;
    
    // PageAnalysisParams
    PageAnalysisParams pap = engine.CreatePageAnalysisParams();
    pap.DetectBarcodes = true;
    pap.AggressiveTableDetection = true;
    
    // SynthesisParamsForDocument
    SynthesisParamsForDocument spfd = engine.CreateSynthesisParamsForDocument();
    spfd.FontSet.SystemFontSet.FontNamesFilter = (int)FontNamesFiltersEnum.FNF_Japanese;
    
    // PreprocessPages
    document.PreprocessPages(null, ppp, oep, rp, null);
    
    // AnalyzePages
    document.AnalyzePages(null, pap, oep, rp);
    
    // Area
    IRegion region = engine.CreateRegion();
    region.AddRect(0, 0, 300, 300);
    IBlock newBlock = document.Pages[0].Layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region);
    ITextBlock textBlock = newBlock.GetAsTextBlock();
    textBlock.RecognizerParams = rp;
                    
    // RecognizeBlocks
    document.Pages[0].RecognizeBlocks(null, null, oep);
    
    // PageSynthesize
    document.Pages[0].Synthesize(spfd);
    
    0
  • Avatar
    Aleksandra Zendrikova
    Hi,

    Actually, AnalyzePages method doesn't perform recognition, it analyzes and creates a layout for further recognition. If you want to recognize only a designated area and create Layout by yourself, you don't need to use AnalyzePages method at all. 

    You can find all necessary steps in Developer’s Help → Guided Tour → Advanced Techniques → Working with Layout and Blocks page in Adding blocks manually section.
    1
  • Avatar
    OHTSUKA Takeshi

    Hi.

    Thank you for answering.
    I understand that analysis properties can not be used only in the specified area.
    It was very helpful.

    0

Please sign in to leave a comment.