Community

OCR for old newspaper scans from microfiche

Written by Permanently deleted user

March 31, 2023 23:15
3

I'm digitizing old newspapers and using FineReader to provide OCR. The goal is not to get a perfect copy of the newspaper, but make the digitized PDF key-word searchable. To that end, the program is working well, but some sections of text are being identified as images and ignored for OCR. Since these files are 900 pages, I don't have time to go through and select each box and convert them to text boxes. Is there a way to tell the program to detect everything as text? I don't care about it capturing the photos.

Thanks,

Kevin

Was this article helpful?

0 out of 0 found this helpful

Comments

3 comments

Nikolai Kromm

April 03, 2023 12:30
Hi Kevin,

Yes, you can set the Property DetectPictures = false of the PageAnalysisParams Object to disable image detection.

Please note, that this option is also present in TextExtraction_Accuracy and TextExtraction_Speed Predefined Profiles.

0
Permanently deleted user

April 03, 2023 13:20
Thank you Nikolai Kromm. I just realized that I may have dropped this question in the wrong area. I'm using the off-the-shelf version of ABBYY Finereader and couldn't locate any config for this.

0
Victoria Dvornikova

April 03, 2023 14:38
Hi Kevin,

I have moved your post to the FineReader desktop section in Community.

Unfortunately, there are no UI settings in FineReader to recognize all pages as Text areas, but you can try to use an Area template feature. Create a template with the rectangular text area and apply it to all pages in OCR Editor.

0

Please sign in to leave a comment.

Community

OCR for old newspaper scans from microfiche

Was this article helpful?

Comments

Didn't find what you were looking for?