If you execute handwriting recognition already using the OCR-completed file, it will not be recognized correctly.
I confirmed that handwriting recognition can be correctly performed on files that are not OCRed.
How do I handle handwriting recognition with OCR-completed files?
I tried setting "CRM_DoNotReuse" with "SourceContentReuseMode" property of "ObjectsExtractionParams" but it was ineffective.
The source is based on the answers for the previous question.
https://forum.ocrsdk.com/thread/how-to-recognizing-handprinted-texts/
Comments
6 comments
Hello!
Please tell us, how do you use the ObjectsExtractionParams object? Do you pass it as a parameter of the IFRDocument::Recognize() method?
Could you please also post here your source code and the image to be processed?
Hello.
I attach the source code and corresponding file, so please confirm.
Code
Hello,
Thank you for the provided information!
Please pay your attention to the fact that before recognition you should perform layout analysis or build up the page layout by yourself. Without this step, FineReader Engine will not be able to find any block on the page. Please review the Developer's Help → Guided Tour → Advanced Techniques → Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis article for more information about processing stages.
As automatic layout analysis is not supported for handprinted texts, please apply the approach described in the topic https://forum.ocrsdk.com/thread/how-to-recognizing-handprinted-texts/ to add necessary blocks on the page layout manually. After adding the blocks call the IFRPage::RecognizeBlocks() method to recognize them.
Please see below the code snippet which demonstrates adding and recognizing the top left handwritten block of your sample file:
Please find attached the result achieved with the help of the given example.
Hope this will help you!
Thanks for the survey and sample code!
However, I'm sorry.
In conclusion, the coordinate data was misaligned.
The image embedded in PDF before processing and the image embedded in PDF after processing had different resolutions.
Before processing: 793px * 1121px
After processing: 2478px * 3503px
It seems that handwritten extraction could not be done because it corresponds to the margin if it is the coordinate value before processing.
Before and after processing, correct the coordinate values and attach files 3.
Also, thank you for your detailed advice such as layout analysis. I will refer to it.
Hi!
For recognition, FineReader Engine uses the binarized copy of the initial image. This is a special format suitable for OCR. For documents scanned at lower resolutions (less than 120 dpi) and documents with small fonts (less than 10 pt), the images may be digitally enlarged to achieve better OCR quality. (See the source image recommendations on the related page.)
In this case, the coordinates of the block region should be taken from the binarized image. In order to obtain it, kindly call the SaveToFile() method of the ImageDocument object. Please review the Developer's Help → API Reference → ImageDocument Object article for the description of the internal image format of FineReader Engine. The Developer's Help → Guided Tour Advanced Techniques → Working with Images section may be also useful for you.
Hope this information will be helpful! The binarized copy of your sample document is attached to this post.
Hi!
Supplementary information Thank you.
We will try the adjustment based on the information.
Please sign in to leave a comment.