Community

OCR for old newspaper scans from microfiche

I'm digitizing old newspapers and using FineReader to provide OCR.  The goal is not to get a perfect copy of the newspaper, but make the digitized PDF key-word searchable.  To that end, the program is working well, but some sections of text are being identified as images and ignored for OCR. Since these files are 900 pages, I don't have time to go through and select each box and convert them to text boxes. Is there a way to tell the program to detect everything as text? I don't care about it capturing the photos. 

 

Thanks,

 

Kevin

Was this article helpful?

0 out of 0 found this helpful

Comments

3 comments

  • Avatar
    Nikolai Kromm

    Hi Kevin,

    Yes, you can set the Property DetectPictures = false of the PageAnalysisParams Object to disable image detection.

    Please note, that this option is also present in TextExtraction_Accuracy and TextExtraction_Speed Predefined Profiles.

    0
  • Avatar
    Kevin Murphy

    Thank you Nikolai Kromm. I just realized that I may have dropped this question in the wrong area.  I'm using the off-the-shelf version of ABBYY Finereader and couldn't locate any config for this. 

    0
  • Avatar
    Victoria Dvornikova

    Hi Kevin,

    I have moved your post to the FineReader desktop section in Community. 

    Unfortunately, there are no UI settings in FineReader to recognize all pages as Text areas, but you can try to use an Area template feature. Create a template with the rectangular text area and apply it to all pages in OCR Editor. 

    0

Please sign in to leave a comment.