Community

Merge OCR results in one big page

Written by Permanently deleted user

July 31, 2021 12:43
3

Hi,

i hope i can explain what i want with my very limited english. when i convert a pdf document or single images files using the ocr editor, the editor creates a single page for every "input page". Very offen i have hyphenation at the end of a page. The spell checker is not able to fix that because the checker finfs the first part of the word and then when checking the next page finds the second part. its not possible to bring the two parts together. This works when hyphenation happens within a page at the end of a line.

What i've done is to merge a lot of images files into one big image file (usually one file for each chapter) and the import. Now after ocr i have one big page. but sometimes the ocr crashes. its a little bit unstable.

Was this article helpful?

0 out of 0 found this helpful

Comments

3 comments

Victoria Dvornikova

August 02, 2021 07:14
Hello,

For Doc\Docx format it is possible to remove hyphens in Tools > Options > Format settings for Editable copy layout:

If your resulted format is searchable PDF then we do not have currently any workaround for such an issue. I have forwarded your suggestion about the possibility to disable the end of paragraph sign if there is a hyphen symbol at the end of the page.

For the crash issue we will need source files and OCR project (saved before crashing). Please create a support request with this information and our Customer Support Team will help you.

0
Jan Ullrich

May 24, 2024 10:07
Hello Victoria,

I'm dealing with the same issue. I've tried the solution you show above but the OCRed text still contains optional hyphens at the end of lines. It is the case whether the check box you indicated is checked or not. And it is present in all formats.
I also tried removing ALL hyphens using the search and replace function (replacing them with nothing); at first this seemed to work fine, but as soon as I save my project, a white space appears in place of every removed hyphen. This is very annoying.

Having to delete the optional hyphens is the one thing that SIGNIFICANTLY slows down my work as there are a lot of them in my text. If there's a solution, I would really appreciate if you let me know.
Thank you

Jan

0
Maiia Chenchyk

May 24, 2024 11:51
Hi Jan,
I have created a support request for the case you described. Please await a response from our Support Team for additional guidance on the issue.

0

Please sign in to leave a comment.

Community

Merge OCR results in one big page

Was this article helpful?

Comments

Didn't find what you were looking for?