Community

Why the visual quality of the pdf page becomes worse after OCR?

After making OCR to a pdf document I save it as pdf. But I notice that the quality of the document becomes poor. Why is that as I need only to add a text layer to the original document? Can anything be done with it?

1

Comments

16 comments

  • Avatar
    Nina Blokhina

    Hello Peter, I created a Support ticket for your question. Please expect an email from us.

    -1
  • Avatar
    Steve Joberson

    I have this exact same question: I created a PDF from a bunch of JPEG files using another program. I open it and it looks great, but it's not searchable.  So thought I'd give FineReader a try since it can OCR and create the text layer. But the problem is, the output PDF looks terrible even when choosing a custom 100% image quality setting. I don't know how this stuff works, but there should be a way to have FineReader simply create and save the text layer and not do ANYTHING to the existing PDF quality or format.  Instead, it looks like it's taking the OCR and "re-writing" the PDF and it looks really lousy.

    Please advise or I'll have to find another software package...

    1
  • Avatar
    Victoria Dvornikova

    Hi Steve Joberson,

    In FineReader PDF 15 (current version) you can open PDF file in PDF editor and select in menu Recognize > Recognize document. In this case the quality of the file won`t be changed.

    In OCR Editor you can try to use the image settings as below (menu Tools > Options > Format Settings):

    -1
  • Avatar
    Ayyan Moyer

    menu Recognize > Recognize document

    In ABBYY FineReader PDF 15 Build 15.0.114.4683; part # 1380.13 there is no such menu item:

    0
  • Avatar
    Victoria Dvornikova

    Hello Ayyan Moyer,

    Menu item Recognize > Recognize document is available only in PDF Editor. OCR Editor does rasterize the PDF image after OCRring, but in PDF Editor you can launch recognition without rasterization.

    0
  • Avatar
    Ayyan Moyer

    only in PDF Editor

    Thanks for the clarification! This application is not in ABBYY.Store, nor is it in the comparison table between the "Standard", "Business" and "Corporate" versions. Therefore, I guess "PDF Editor" is included in all packages. Which of the following files does the PDF Editor application launch?

    1. AbbyySTI.exe
    2. AInfo.exe
    3. App.StatisticSender.exe
    4. Comparator.exe
    5. FineCmd.exe
    6. FineExec.exe
    7. FineReader.exe
    8. FineReaderOCR.exe
    9. FineUpdate.exe
    10. HotFolder.exe
    11. Module64\Drivers\PrnInstaller.exe
    12. Module64\pdfSaver5af15.exe
    13. Module86\Drivers\PrnInstaller.exe
    14. Module86\pdfSaver5af15.exe
    15. OcrEngine.Background.Host.exe
    16. pdfSaver5af15.exe
    17. PrinterIntegration.exe
    18. Registrator.exe
    19. ScanTwain.exe
    20. ScanWia.exe
    21. ScreenshotReader.exe
    22. TrigrammsInstaller.exe
    23. Uninstall.exe
    24. UpdateInstaller.exe
    0
  • Avatar
    Victoria Dvornikova

    Ayyan Moyer
    FineReader PDF Editor can be started from the Start menu as below:

    The process from your list is FineReader.exe

    0
  • Avatar
    Ayyan Moyer

    @Victoria, when I launch FineReader.exe, the ABBYY FineReader PDF 15 window appears. The process manager lists this application as "ABBYY FineReader PDF 15 (32 bit)". There is no "Recognize" menu in it:

    Only "File", "Edit", "View", "Tools" and "Help". This item appears only when the document is open:

    "File" / "Recognize Document" / "Recognize Document ... Ctrl + Shift + R" is that what you mean?

    This item is not in the "File" menu when no document is open in the application.

    Is this menu really in different places in different builds of the 15th version of the application?

    0
  • Avatar
    Victoria Dvornikova

    Hi Ayyan Moyer,

    "File" / "Recognize Document" / "Recognize Document ... Ctrl + Shift + R" is that what you mean?

    Yes, this is another variant to start the same process. Previously I meant the following Recognize menu item in the toolbar of PDF Editor:

    1
  • Avatar
    Ayyan Moyer

    Hot Folder spoils documents just like FineReader PDF OCR Editor. Is there a way to batch add an invisible text layer to documents as well as FineReader PDF does?

    0
  • Avatar
    Victoria Dvornikova

    Hello Ayyan Moyer,

    No, currently Hot Folder works as OCR Editor only. But I have forwarded your suggestion to our R&D department.

    1
  • Avatar
    Victoria Dvornikova

    Small addition from me regarding a possible workaround in Hot Folder. You can try to change some settings in Hot Folder task to check if it works for you:

    1. On the Save step select PDF format and click Options.

    2. Select Custom in Image Quality and disable MRC compression.

    3. Disable Reduce original resolution option.

    4. Disable changes in Color and Image quality:

    0
  • Avatar
    Ayyan Moyer

    Victoria, thank you for your answer!

    I ran a test with your recommended settings

    Original file - 34.5 MB, obtained after adding a text layer using FineReader PDF - 33.9 MB, obtained after processing by the Hot Folder application - 114 MB. The size has more than tripled!

    The quality is unsatisfactory:

    (on the left is the original, on the right is the document after processing it in the Hot Folder)

    0
  • Avatar
    Yuriy Korotkevych

    Hi Ayyan, it looks like your original PDF may be a digital PDF without a text layer, with just vector images for the characters. Text in such a PDF looks smooth regardless on magnification you set to view them. OCR Editor and Hot Folder are document conversion tools, and as such they do rasterize images of pages from a digital PDF when processing it as Victoria mentioned earlier. Rasterized images get a limited resolution wich reults in pixelized images of charaters, especially when zoomed in. Also, rasterized images may become bigger than their better looking vector originals.

    To improve the result you're getting using HotFolder, I'd suggest you to try also this combination of the settings in the dialogue on the screensgot below:

    It won't restore the character images spoiled by rasterisation to exactly original quality, but may help to some extent making the rasterised characters noticeably smoother.

    We've noted your request for a batch adding of a text layer to PDFs without thier conversion and will consider it for developing in the future.

    1
  • Avatar
    Ayyan Moyer

     Yuriy, thank you for the detailed answer.

    a digital PDF without a text layer, with just vector images for the characters

    Yes exactly.

    OCR Editor and Hot Folder are document conversion tools

    Yes, now I get it. It is a pity that the conversion tools cannot, after rasterizing and recognizing a temporary copy of the document, simply add a new layer to the original document by deleting the copy.

    Test results with your recommended settings:

    (the result is even worse than with the "Use MRC compression (requires OCR)" / "Apply ABBYY PreciseScan to smooth characters on page images" option disabled)

    I got all the answers to my questions, thank you and Victoria again!

    0
  • Avatar
    Yuriy Korotkevych

    It looks like it was bad because of inverted text. For non-inverted text it usually helps:

    Without PreciseScan processing:

    With Precise Scan processing:

    0

Please sign in to leave a comment.