Is it possible to have an ABBYY product scan my collection of PDF files automatically and report duplicate files?
Background:
I have a lot of PDF files and some of these files in the collection are (more or less) the same. If you look at the contents (OCR). I would to scan all the PDF files automatically and get a report of files which are (for example min. 90% the same, based on contents after OCR scan).
Basically something like Anti-Twin does for .jpg files, but now for PDF files.
Thanks in advance.
Comments
1 comment
Our engine product doesn't have that option out of the box. You would have to programmatically check the file hash.
Otherwise, look at our Fine Reader Server product. It has a built in functionality to check duplicate file via hash.
Getting a report on duplicate files with FineReader Server 14 Audit Workflow – Help Center (abbyy.com)
Please sign in to leave a comment.