Community

How to Handwriting recognition of OCR-completed files Answered

If you execute handwriting recognition already using the OCR-completed file, it will not be recognized correctly.
I confirmed that handwriting recognition can be correctly performed on files that are not OCRed.
How do I handle handwriting recognition with OCR-completed files?

I tried setting "CRM_DoNotReuse" with "SourceContentReuseMode" property of "ObjectsExtractionParams" but it was ineffective.

The source is based on the answers for the previous question.

https://forum.ocrsdk.com/thread/how-to-recognizing-handprinted-texts/

Was this article helpful?

0 out of 0 found this helpful

Comments

6 comments

  • Avatar
    Helen Osetrova

    Hello!

     

    Please tell us, how do you use the ObjectsExtractionParams object? Do you pass it as a parameter of the IFRDocument::Recognize() method?

     

    Could you please also post here your source code and the image to be processed? 

    0
  • Avatar
    OHTSUKA Takeshi

    Hello.

    I attach the source code and corresponding file, so please confirm.

     

    Code

    
    // This code is an error 
    // (HRESULT Exception:0x80040154 (REGDB_E_CLASSNOTREG))
    // //ObjectsExtractionParams objExParam = new ObjectsExtractionParams();
    //objExParam.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;
    // document.Recognize(null, objExParam);
    // document.Synthesize(null);

    // This code does not work
    DocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();
    dpp.PageProcessingParams.ObjectsExtractionParams.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;
    document.Recognize(null, dpp.PageProcessingParams.ObjectsExtractionParams);
    document.Synthesize(null);

     

    0
  • Avatar
    Helen Osetrova

    Hello,

     

    Thank you for the provided information!

     

    Please pay your attention to the fact that before recognition you should perform layout analysis or build up the page layout by yourself. Without this step, FineReader Engine will not be able to find any block on the page. Please review the Developer's Help  Guided Tour  Advanced Techniques  Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis article for more information about processing stages.

     

    As automatic layout analysis is not supported for handprinted texts, please apply the approach described in the topic https://forum.ocrsdk.com/thread/how-to-recognizing-handprinted-texts/ to add necessary blocks on the page layout manually. After adding the blocks call the IFRPage::RecognizeBlocks() method to recognize them. 

     

    Please see below the code snippet which demonstrates adding and recognizing the top left handwritten block of your sample file:

    // Get the layout of the first page
    FREngine.FRPages pages = document.Pages;
    FREngine.FRPage page = pages.Item(0);
    FREngine.Layout layout = page.Layout;

    // Set the block region
    FREngine.Region region = engineLoader.Engine.CreateRegion();
    region.AddRect(321, 458, 1040, 524);

    // Create a new block
    FREngine.IBlock newBlock = layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region, 0);
    FREngine.TextBlock textBlock = newBlock.GetAsTextBlock();

    // Specify the text parameters
    typetextBlock.RecognizerParams.TextTypes = (int)FREngine.TextTypeEnum.TT_Handprinted;
    textBlock.RecognizerParams.SetPredefinedTextLanguage("Digits");

    // Specify the type of marking around the letters
    textBlock.RecognizerParams.FieldMarkingType = FREngine.FieldMarkingTypeEnum.FMT_SimpleText;
    textBlock.RecognizerParams.WritingStyle = FREngine.WritingStyleEnum.WS_Japanese;

    // Replace page in the document with the new one
    page.Layout = layout;
    document.Pages.DeleteAt(0);
    document.AddPage(page);

    // Tune the ObjectsExtractionParams object
    FREngine.DocumentProcessingParams dpp = engineLoader.Engine.CreateDocumentProcessingParams();
    dpp.PageProcessingParams.ObjectsExtractionParams.SourceContentReuseMode = FREngine.SourceContentReuseModeEnum.CRM_DoNotReuse;

    // Recognize blocks 
    document.Pages[0].RecognizeBlocks(null, null, dpp.PageProcessingParams.ObjectsExtractionParams);
    document.Synthesize(null);               

    // Save the result
    document.Export(@"D:\Temp\reuse_recognized.pdf", FREngine.FileExportFormatEnum.FEF_PDF, null);

     

    Please find attached the result achieved with the help of the given example.

     

    Hope this will help you!  

     

    0
  • Avatar
    OHTSUKA Takeshi

    Thanks for the survey and sample code!
    However, I'm sorry.
    In conclusion, the coordinate data was misaligned.
    The image embedded in PDF before processing and the image embedded in PDF after processing had different resolutions.

    Before processing: 793px * 1121px
    After processing: 2478px * 3503px

    It seems that handwritten extraction could not be done because it corresponds to the margin if it is the coordinate value before processing.

    Before and after processing, correct the coordinate values and attach files 3.
    Also, thank you for your detailed advice such as layout analysis. I will refer to it.

    0
  • Avatar
    Helen Osetrova

    Hi! 

     

    For recognition, FineReader Engine uses the binarized copy of the initial image. This is a special format suitable for OCR. For documents scanned at lower resolutions (less than 120 dpi) and documents with small fonts (less than 10 pt), the images may be digitally enlarged to achieve better OCR quality. (See the source image recommendations on the related page.)

     

    In this case, the coordinates of the block region should be taken from the binarized image. In order to obtain it, kindly call the SaveToFile() method of the ImageDocument object. Please review the Developer's Help → API Reference → ImageDocument Object article for the description of the internal image format of  FineReader Engine. The Developer's Help → Guided Tour Advanced Techniques → Working with Images section may be also useful for you.

     

    Hope this information will be helpful! The binarized copy of your sample document is attached to this post.

    0
  • Avatar
    OHTSUKA Takeshi

    Hi!

    Supplementary information Thank you.
    We will try the adjustment based on the information.

    0

Please sign in to leave a comment.