Community

How to Handwriting recognition of OCR-completed files Answered

Written by Permanently deleted user

September 13, 2018 06:06
6

If you execute handwriting recognition already using the OCR-completed file, it will not be recognized correctly.
I confirmed that handwriting recognition can be correctly performed on files that are not OCRed.
How do I handle handwriting recognition with OCR-completed files?

I tried setting "CRM_DoNotReuse" with "SourceContentReuseMode" property of "ObjectsExtractionParams" but it was ineffective.

The source is based on the answers for the previous question.

https://forum.ocrsdk.com/thread/how-to-recognizing-handprinted-texts/

Was this article helpful?

0 out of 0 found this helpful

Comments

6 comments

Helen Osetrova

September 14, 2018 14:04
Hello!

Please tell us, how do you use the ObjectsExtractionParams object? Do you pass it as a parameter of the IFRDocument::Recognize() method?

Could you please also post here your source code and the image to be processed?

0

Permanently deleted user

September 18, 2018 00:59

Hello.

I attach the source code and corresponding file, so please confirm.

Code


// This code is an error 

// (HRESULT Exception:0x80040154 (REGDB_E_CLASSNOTREG))
//
//ObjectsExtractionParams objExParam = new ObjectsExtractionParams();

//objExParam.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;

// document.Recognize(null, objExParam);

// document.Synthesize(null);



// This code does not work

DocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();

dpp.PageProcessingParams.ObjectsExtractionParams.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;

document.Recognize(null, dpp.PageProcessingParams.ObjectsExtractionParams);

document.Synthesize(null);

586a8421-562d-4abf-93c4-a95f00108df1_test-20180918-ocr.pdf

Helen Osetrova

September 18, 2018 14:39

Hello,

Thank you for the provided information!

Please pay your attention to the fact that before recognition you should perform layout analysis or build up the page layout by yourself. Without this step, FineReader Engine will not be able to find any block on the page. Please review the Developer's Help → Guided Tour → Advanced Techniques → Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis article for more information about processing stages.

As automatic layout analysis is not supported for handprinted texts, please apply the approach described in the topic https://forum.ocrsdk.com/thread/how-to-recognizing-handprinted-texts/ to add necessary blocks on the page layout manually. After adding the blocks call the IFRPage::RecognizeBlocks() method to recognize them.

Please see below the code snippet which demonstrates adding and recognizing the top left handwritten block of your sample file:

// Get the layout of the first page
FREngine.FRPages pages = document.Pages;
FREngine.FRPage page = pages.Item(0);
FREngine.Layout layout = page.Layout;

// Set the block region
FREngine.Region region = engineLoader.Engine.CreateRegion();
region.AddRect(321, 458, 1040, 524);

// Create a new block
FREngine.IBlock newBlock = layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region, 0);
FREngine.TextBlock textBlock = newBlock.GetAsTextBlock();

// Specify the text parameters
typetextBlock.RecognizerParams.TextTypes = (int)FREngine.TextTypeEnum.TT_Handprinted;
textBlock.RecognizerParams.SetPredefinedTextLanguage("Digits");

// Specify the type of marking around the letters
textBlock.RecognizerParams.FieldMarkingType = FREngine.FieldMarkingTypeEnum.FMT_SimpleText;
textBlock.RecognizerParams.WritingStyle = FREngine.WritingStyleEnum.WS_Japanese;

// Replace page in the document with the new one
page.Layout = layout;
document.Pages.DeleteAt(0);
document.AddPage(page);

// Tune the ObjectsExtractionParams object 
FREngine.DocumentProcessingParams dpp = engineLoader.Engine.CreateDocumentProcessingParams();
dpp.PageProcessingParams.ObjectsExtractionParams.SourceContentReuseMode = FREngine.SourceContentReuseModeEnum.CRM_DoNotReuse;

// Recognize blocks 
document.Pages[0].RecognizeBlocks(null, null, dpp.PageProcessingParams.ObjectsExtractionParams);
document.Synthesize(null);               

// Save the result
document.Export(@"D:\Temp\reuse_recognized.pdf", FREngine.FileExportFormatEnum.FEF_PDF, null);

Please find attached the result achieved with the help of the given example.

Hope this will help you!

fcd6dc8b-412f-43b1-aa4f-a95f00f279b7_reuse-recognized.pdf

Permanently deleted user

September 19, 2018 07:52
Thanks for the survey and sample code!
However, I'm sorry.
In conclusion, the coordinate data was misaligned.
The image embedded in PDF before processing and the image embedded in PDF after processing had different resolutions.

Before processing: 793px * 1121px
After processing: 2478px * 3503px

It seems that handwritten extraction could not be done because it corresponds to the margin if it is the coordinate value before processing.

Before and after processing, correct the coordinate values and attach files 3.
Also, thank you for your detailed advice such as layout analysis. I will refer to it.

20239045-a9c7-4fc1-bcdf-a96000823870_test-20180918-before.pdf

65cb22d7-6b74-4611-b7cd-a96000824f36_test-20180916-after.pdf

bba72013-6d16-4f0b-a454-a9600083595e_test-20180918-ok.pdf
0
Helen Osetrova

September 21, 2018 17:23
Hi!

For recognition, FineReader Engine uses the binarized copy of the initial image. This is a special format suitable for OCR. For documents scanned at lower resolutions (less than 120 dpi) and documents with small fonts (less than 10 pt), the images may be digitally enlarged to achieve better OCR quality. (See the source image recommendations on the related page.)

In this case, the coordinates of the block region should be taken from the binarized image. In order to obtain it, kindly call the SaveToFile() method of the ImageDocument object. Please review the Developer's Help → API Reference → ImageDocument Object article for the description of the internal image format of FineReader Engine. The Developer's Help → Guided Tour Advanced Techniques → Working with Images section may be also useful for you.

Hope this information will be helpful! The binarized copy of your sample document is attached to this post.
da443991-bb95-41ee-938e-a962011f0ad5_reuse-bin.jpg
0
Permanently deleted user

September 27, 2018 02:09
Hi!

Supplementary information Thank you.
We will try the adjustment based on the information.

0

Please sign in to leave a comment.

Community

How to Handwriting recognition of OCR-completed files Answered

Was this article helpful?

Comments

Didn't find what you were looking for?