I'm digitizing old newspapers and using FineReader to provide OCR. The goal is not to get a perfect copy of the newspaper, but make the digitized PDF key-word searchable. To that end, the program is working well, but some sections of text are being identified as images and ignored for OCR. Since these files are 900 pages, I don't have time to go through and select each box and convert them to text boxes. Is there a way to tell the program to detect everything as text? I don't care about it capturing the photos.
Thanks,
Kevin
Comments
3 comments
Hi Kevin,
Yes, you can set the Property DetectPictures = false of the PageAnalysisParams Object to disable image detection.
Please note, that this option is also present in TextExtraction_Accuracy and TextExtraction_Speed Predefined Profiles.
Thank you Nikolai Kromm. I just realized that I may have dropped this question in the wrong area. I'm using the off-the-shelf version of ABBYY Finereader and couldn't locate any config for this.
Hi Kevin,
I have moved your post to the FineReader desktop section in Community.
Unfortunately, there are no UI settings in FineReader to recognize all pages as Text areas, but you can try to use an Area template feature. Create a template with the rectangular text area and apply it to all pages in OCR Editor.
Please sign in to leave a comment.