Community

How To Edit OCR Text WITHOUT Re-recognizing Existing OCRed Text

So I am importing a bunch of pictures to make an searchable text PDF, and it takes a while of course to process the OCRing. Then when it is done, it opens it so I can READ it, but cannot edit the text. To edit it, I need to RE-recognize the already OCRed text...which is counterproductive.

I should also point out that if I just try to hit the Edit tab at the top, it warns me "This page contains a text layer under the image. Editing text on this page will change the text layer and make the edited fragment appear on the page".

Also, whenever I have an ALREADY OCRed document that just has ONE typo in the OCRed text, how can I import that and edit just that little blurb of text without RE-OCRing the whole thing?

Was this article helpful?

1 out of 1 found this helpful

Comments

4 comments

  • Avatar
    Yuriy Korotkevych

    Hi!

    FineReader PDF does different kinds of processing when converting images of documents to a searchable PDF vs. when preparing an existing PDF for editing. And as PDF itself, as a format, is not editable, FineReader thus must prepare any PDF for editing, even if the PDF already contains text, and even if it has been created just seconds ago by FineReader itself. Such preparation uses OCR, but in case of a PDF with text inside OCR is used not for capturing the text itself from the image, but for analyzing the layout of pages and then embedding the edits into the document. You can read about this more in our blog article. FineReader always starts preparation for editing from the page that you're currently on, so if you spotted a typo on a page and clicked "Edit document", you can edit this typo on this page already in a few seconds, and you don't need to wait until FineReader finishes preparation of the other pages in the document: just continue working with the document, for example, by clicking on another tab, or saving and closing it.

    The warning you mentioned is just to make the users aware of the fact that changes made when editing a searchable PDF will affect the text layer as well and also may affect the visual appearance of the document.

    To eliminate typos in the OCRed text, in your particular case, taking into account your workflow as you described it, I would recommend using the Verification tool in OCR Editor yet on the stage of converting images of a document to searchable PDF before saving the results to the PDF. Here is the Help section about the tool.

    Hope it helps!

    Yuriy

     

     
    0
  • Avatar
    Victoria Dvornikova

    Hello,

    Please try to edit the text in PDF Editor window as described in our online help: https://help.abbyy.com/en-us/finereader/16/user_guide/edittext/.

    If you have any questions related to a specific document, please create a support ticket and provide us with the desired scenario and the file. 

    0
  • Avatar
    Manfred Schwarz

    Maybe I didn't understand the solution to the re-editing issue correctly (last paragraph of the original question).

    The problem pops up when I OCR a PDF and edit it and then save it and some days or weeks or years later I find some more errors that I would like to correct as well.

    When I reload the PDF to FineReader, it analyzes it again and runs the OCR again. By this I'm losing all the corrections I had done initially - can I keep the first corrections and add new corrections later?

    The information about the OCRed and edited text level must be somehow part of the PDF, so it would be nice, if one can base later editing on that.

    Is this possible?

    Thanks

    0
  • Avatar
    Yuriy Korotkevych

    Hello Manfred! 

    You can do that by editing the PDF directly, but with a limitation. The limitation is that FineReader PDF can't edit the text layer of an existing PDF without editing its visible layer too, so it may happen that the editing will alter visual appearance of the part of the page surrounding the typo. Will it be noticeable and how much - it can differ case by case, and depends on multiple factors. 

    For multipage documents, when direct editing doesn't provide you the result that is visually good enough for you, I can suggest another way of correcting a typo in order to avoid re-recognizing the whole multipage document and save time. Run only the page that contains the typo through the FineReader's OCR Editor, verify and correct OCR results in it using the Verify tool, save it to a PDF and then replace the page with the typo in the initial PDF with the newly created corrected PDF page using Organize Pages tools in the PDF Editor. 

     
    0

Please sign in to leave a comment.