Community

How can I find all image-only PDFs in a folder?

I have lots of Windows folders that have a mix of OCR'd and image-only PDFs. I have two related questions.

1.  For any particular folder, how can I identify all of the image-only PDFs, either in a highlighted list or an output file? Does ABBYY have a way to do this, or is some other program, batch file or PowerShell script that do this?

2.  I have a list certain PDF files (but not all of the PDF files) located in a particular folder, is there a way for ABBYY Fine Reader do automatic OCR on those PDFs but not other PDFs?

Was this article helpful?

0 out of 0 found this helpful

Comments

1 comment

  • Avatar
    Scott Chau

    Larry,

    Other than our SDK product which you would have to have software development experience, FineReader Server would be the closest option. 

    1. For any particular folder, how can I identify all of the image-only PDFs, either in a highlighted list or an output file? Does ABBYY have a way to do this, or is some other program, batch file or PowerShell script that do this?
    • With FineReader Server you can set up an Audit-Workflow to get an overview of the files in a directory, including the number of documents, that may need OCR (cf. https://youtu.be/1Q2cVt4XKCE).
    • However there is no option to get a highlighted list/output file with the actual documents‘ names. 
    1. I have a list certain PDF files (but not all of the PDF files) located in a particular folder, is there a way for ABBYY Fine Reader do automatic OCR on those PDFs but not other PDFs?
    • FR Server can be configured to skip OCR for PDFs that are already searchable and also offers filter-options, so you can exclude specific formats from being processed, but I don’t know of any way to limit the processing according to a list of files unfortunately.

    For now, I'll move this post to our FineReader Server section.

    1

Please sign in to leave a comment.