Improvement Request: Punctuation parsing during OCR

May 07, 2024 23:16
3

Just a request for a future update: Improvements to whatever algorithms/REGEX strings/magic incantations that add spaces after periods, colons, and other such punctuation. I think the OCR process is trying to make sure that spaces are properly added after periods at the ends of sentences and colons within sentences. However, this is consistently causing problems for the following use cases:

1. Email addresses: appear as "username@domain. com"

2. Web addresses: appear as "https: //www. example. com"

3. Times: appear as "3: 00 p. m."

4. Numbers with decimals: appear as "$5. 75" or "1,234. 567"

I'd bet there are other similar examples, but these are the types of errors that I've been encountering most frequently.

Comments

3 comments

Victoria Dvornikova

May 09, 2024 10:18
Hi Michael,

I've created a support ticket based on your feedback. Our customer support agent will contact you soon and request some additional information.

0
Michael Hanscom

May 15, 2024 16:16
Thanks Victoria. I've uploaded a sample project as requested. Thanks to you and your team for looking into this.

0
Michael Hanscom

May 17, 2024 15:45
Update in case any other users browse through here: After working with ABBYY tech support, it seems this was a "me problem", and something was odd on my computer. After removing and reinstalling ABBYY, I'm no longer seeing these issues. The support ticket is closed and ABBYY's working as expected again.

1

Please sign in to leave a comment.

Community

Improvement Request: Punctuation parsing during OCR

Was this article helpful?

Comments

Didn't find what you were looking for?