After making OCR to a pdf document I save it as pdf. But I notice that the quality of the document becomes poor. Why is that as I need only to add a text layer to the original document? Can anything be done with it?
Why the visual quality of the pdf page becomes worse after OCR?
Was this article helpful?
1 out of 1 found this helpful
Comments
16 comments
Hello Peter, I created a Support ticket for your question. Please expect an email from us.
I have this exact same question: I created a PDF from a bunch of JPEG files using another program. I open it and it looks great, but it's not searchable. So thought I'd give FineReader a try since it can OCR and create the text layer. But the problem is, the output PDF looks terrible even when choosing a custom 100% image quality setting. I don't know how this stuff works, but there should be a way to have FineReader simply create and save the text layer and not do ANYTHING to the existing PDF quality or format. Instead, it looks like it's taking the OCR and "re-writing" the PDF and it looks really lousy.
Please advise or I'll have to find another software package...
Hi Steve Joberson,
In FineReader PDF 15 (current version) you can open PDF file in PDF editor and select in menu Recognize > Recognize document. In this case the quality of the file won`t be changed.
In OCR Editor you can try to use the image settings as below (menu Tools > Options > Format Settings):
In ABBYY FineReader PDF 15 Build 15.0.114.4683; part # 1380.13 there is no such menu item:
Hello Ayyan Moyer,
Menu item Recognize > Recognize document is available only in PDF Editor. OCR Editor does rasterize the PDF image after OCRring, but in PDF Editor you can launch recognition without rasterization.
Thanks for the clarification! This application is not in ABBYY.Store, nor is it in the comparison table between the "Standard", "Business" and "Corporate" versions. Therefore, I guess "PDF Editor" is included in all packages. Which of the following files does the PDF Editor application launch?
Ayyan Moyer,
FineReader PDF Editor can be started from the Start menu as below:
The process from your list is FineReader.exe
@Victoria, when I launch FineReader.exe, the ABBYY FineReader PDF 15 window appears. The process manager lists this application as "ABBYY FineReader PDF 15 (32 bit)". There is no "Recognize" menu in it:
Only "File", "Edit", "View", "Tools" and "Help". This item appears only when the document is open:
"File" / "Recognize Document" / "Recognize Document ... Ctrl + Shift + R" is that what you mean?
This item is not in the "File" menu when no document is open in the application.
Is this menu really in different places in different builds of the 15th version of the application?
Hi Ayyan Moyer,
Yes, this is another variant to start the same process. Previously I meant the following Recognize menu item in the toolbar of PDF Editor:
Hot Folder spoils documents just like FineReader PDF OCR Editor. Is there a way to batch add an invisible text layer to documents as well as FineReader PDF does?
Hello Ayyan Moyer,
No, currently Hot Folder works as OCR Editor only. But I have forwarded your suggestion to our R&D department.
Small addition from me regarding a possible workaround in Hot Folder. You can try to change some settings in Hot Folder task to check if it works for you:
1. On the Save step select PDF format and click Options.
2. Select Custom in Image Quality and disable MRC compression.
3. Disable Reduce original resolution option.
4. Disable changes in Color and Image quality:
Victoria, thank you for your answer!
I ran a test with your recommended settings
Original file - 34.5 MB, obtained after adding a text layer using FineReader PDF - 33.9 MB, obtained after processing by the Hot Folder application - 114 MB. The size has more than tripled!
The quality is unsatisfactory:
(on the left is the original, on the right is the document after processing it in the Hot Folder)
Hi Ayyan, it looks like your original PDF may be a digital PDF without a text layer, with just vector images for the characters. Text in such a PDF looks smooth regardless on magnification you set to view them. OCR Editor and Hot Folder are document conversion tools, and as such they do rasterize images of pages from a digital PDF when processing it as Victoria mentioned earlier. Rasterized images get a limited resolution wich reults in pixelized images of charaters, especially when zoomed in. Also, rasterized images may become bigger than their better looking vector originals.
To improve the result you're getting using HotFolder, I'd suggest you to try also this combination of the settings in the dialogue on the screensgot below:
It won't restore the character images spoiled by rasterisation to exactly original quality, but may help to some extent making the rasterised characters noticeably smoother.
We've noted your request for a batch adding of a text layer to PDFs without thier conversion and will consider it for developing in the future.
Yuriy, thank you for the detailed answer.
Yes exactly.
Yes, now I get it. It is a pity that the conversion tools cannot, after rasterizing and recognizing a temporary copy of the document, simply add a new layer to the original document by deleting the copy.
Test results with your recommended settings:
(the result is even worse than with the "Use MRC compression (requires OCR)" / "Apply ABBYY PreciseScan to smooth characters on page images" option disabled)
I got all the answers to my questions, thank you and Victoria again!
It looks like it was bad because of inverted text. For non-inverted text it usually helps:
Without PreciseScan processing:
With Precise Scan processing:
Please sign in to leave a comment.