コミュニティ

Can I restrict OCR to a limited set of characters to recognize?

I use Finereader to OCR scans of century-old newspaper articles. The result is much better than other OCR I've tried, but it often gives me odd characters. Is it possible to restrict the set of characters it converts text to?

For example, it sometimes gives me accented characters when the "accent" it sees is really just a faint, stray mark on the old newspaper. 

I'd like to restrict it to the English alphabet (lowercase and uppercase) and common punctuation.

この記事は役に立ちましたか?

0人中0人がこの記事が役に立ったと言っています

コメント

1件のコメント

  • Avatar
    Nicholas Keller

    Yeah, in Finereader 15 I have Options ---> Languages ---> Scroll down to the bottom of the second box. Create New language (which is a bit of a misnomer!).  Name your edited language, for instance "English with fewer accidentals". Set source language to English or whatever it is you use. Then in the next menu you have Alphabet. Here is where you can browse through a great deal of unicode characters and either select them, unselect them. You can add characters to be detected and subtract. I believe ABBYY has baseline training with these characters in its core function, although I'm not entirely sure how everything works. I used these functions to create a viable ancient Greek Polytonic language with all accent and breath marks, and a version of English with more symbols that I needed it to pick up upon. Finally, you have a user dictionary option where you can give it terms that you want it to see as a working word in that language. Doing that can speed things up a lot since not knowing a word causes ABBYY to "think" harder about what it's "seeing".

    And of course there's setting up training patterns.

    Have fun!

    0

サインインしてコメントを残してください。