Community

Improvement Request: Punctuation parsing during OCR

Just a request for a future update: Improvements to whatever algorithms/REGEX strings/magic incantations that add spaces after periods, colons, and other such punctuation. I think the OCR process is trying to make sure that spaces are properly added after periods at the ends of sentences and colons within sentences. However, this is consistently causing problems for the following use cases:

1. Email addresses: appear as "username@domain. com"

2. Web addresses: appear as "https: //www. example. com"

3. Times: appear as "3: 00 p. m."

4. Numbers with decimals: appear as "$5. 75" or "1,234. 567"

I'd bet there are other similar examples, but these are the types of errors that I've been encountering most frequently.

Was this article helpful?

0 out of 0 found this helpful

Comments

3 comments

  • Avatar
    Victoria Dvornikova

    Hi Michael,

    I've created a support ticket based on your feedback. Our customer support agent will contact you soon and request some additional information.

    0
  • Avatar
    Michael Hanscom

    Thanks Victoria. I've uploaded a sample project as requested. Thanks to you and your team for looking into this.

    0
  • Avatar
    Michael Hanscom

    Update in case any other users browse through here: After working with ABBYY tech support, it seems this was a "me problem", and something was odd on my computer. After removing and reinstalling ABBYY, I'm no longer seeing these issues. The support ticket is closed and ABBYY's working as expected again.

    1

Please sign in to leave a comment.