How to ignore special characters in OCR recognition

Good day.
Is there a way to simply switch off or ignore some characters during recognition?  We transfer text from the recognised PDF into an XML and special characters like "□" cause problems there.
Therefore, I would like to configure the job on the server so that certain characters are not replaced by "?" or "□", but simply ignored.
Similarly, with other ASCII and UFT8 characters, e.g. 
Ͱ ˩ ˥ ˦ ˧ ˾ ┌ ┐ └ ┘ ⌐ ■ □ ¬

Thank you for your support.

Thomas Berg

Was this article helpful?

0 out of 0 found this helpful



Please sign in to leave a comment.