Community

Recognition of "punched" font

Hello Community!

Hello ABBYY!

More often than I'd want to, I have to OCR considerable volumes of something that can be called "punched" text. It looks like this:

So far, I tried:

1. Training FineReader 15 on this, but the OCR result could not be improved to a level justifying further editing of the result.

2. Manipulating the image before OCR in graphics-editing software to improve the structure of the font for OCR. The things I used were a descreen filter in GIMP, converting to black&white, erosion in Corel PSP. All to no avail with this particular sample.

Has anyone come across this type of "font" and found a way to get a decent OCR result with it?

Thanks in advance for any suggestion!

Was this article helpful?

0 out of 0 found this helpful

Comments

2 comments

  • Avatar
    Yuriy Korotkevych

    Hello Piotr,

    Unfortunately, I can't really suggest you anything that would solve this problem. 

    However, we in ABBYY would like to know more about the case. What are the documents/images that use this type of printing? Where do they come from and where are they used? If you could share with us some document samples, it would be very helpful.

    Best regards,

    Yuriy

    0
  • Avatar
    Piotr Piela

    Hi Yuriy,

    Thanks for your reply. I suspected it was not a simple problem to solve.

    You can find this type of printing in patent descriptions that are shared by inventors' companies. I suppose this is an anti-OCR measure to discourage unauthorized use of someone else's content at the stage of legal proceeding of a patent application. However, when you're a translator hired by a company to translate patent documents provided by them, like I am, and you're denied a selectable version of the document, you need to work your way through somehow. Typing in the whole stuff is not an option.

    Unfortunately, I cannot share with you the documents that I have, for the reason of confidentiality. However, images with this type of printing often end up in the final, publicly available versions of patent documents available from patent offices. Take for example EP3458448B1 downloadable from the EPO publications server.

    It would be of great help for people like me, if ABBYY could implement this special case in its engine.

    Best regards,

    Piotr

    0

Please sign in to leave a comment.