Community

processTextField ignoring letterset and regexp params

Written by Permanently deleted user

March 11, 2014 07:06
7

Hello,

I'm just getting started with OCR SDK, and I'm using PHP to send POST method with an image of a serial number, and I'm testing out letterSet and despite setting it as follows:

http://cloud.ocrsdk.com/processTextField?language=english&letterSet=0123456789*

I still get letters returned in the XML result set. I've also been testing the RegExp parameter, and that also seems to be ignored (returning letters where only numbers are specified in the RegExp). In this case, I am expanding letterSet to include all letters and numbers and adding this regexp parameter:

&regExp=[A-Z]?[A-Z][0-9]{4}

What I am trying to do is have OCR recognize a serial number in the format: A?ANNNN (A=letter, N=number) where only digits can appear in positions 3-6, and only a one or two letter prefix (A-Z).

I assume that the parameters for processTextField are sent in the URL string (GET) as opposed to sending with the POST along with the image?

I did see the post about using the "Digits" language, but my requirements are more than what is contained in that language.

Thanks.

Was this article helpful?

0 out of 0 found this helpful

Comments

7 comments

Permanently deleted user

March 11, 2014 17:15
To let us test it, could you please share or sent to CloudOCRSDK@abbyy,com the image you recognize?

0
Permanently deleted user

March 12, 2014 00:30
I sent a detailed message to that email address. thanks.

0
Permanently deleted user

March 18, 2014 02:59
Thank you. We have received your letter and will reply tomorrow.

0
Permanently deleted user

March 18, 2014 17:30
The issue occurs because OCR technologies are not trained well for this font. It should be fixed in the future.

We find our that both of your images could be completely recognized with the following URL: http://cloud.ocrsdk.com/processTextField?textType=handprinted

Thank you for your patience!

0
Permanently deleted user

March 18, 2014 19:12
Thanks for your reply.

I had tried "handprinted" as well as all the other textType types during testing, but handprinted failed on many more of the other images I tested.

I found that using "textType=normal,typewriter" generated the smallest number of OCR errors for my images. Really, the only one image that failed with "textType=normal,typewriter" was the one I emailed you.

Can you explain how the RegExp parameter works, since Abbyy still returns values that would not pass the RegExp I'm using.

In the mean time, I'll just write some code on my end to detect the mis-reads that violate the RegExp values, and try to correct them before passing to my application.

Thanks again.

0
Permanently deleted user

March 18, 2014 20:01
We have found two bugs, that should be fixed in the nearest feature and could be avoided now:

Regular expression does not works when the language is specified directly. We recommend do not specify the language in the URL (letterset and regExp are enough).

It is something wrong with asterisk in the letterset: when it is used with handprinted text type, an error occurs. If all of your expressions contains an asterisk in the end, probably you can recognize only the text before it.
0
Permanently deleted user

March 18, 2014 20:03
Also the syntax you use is slightly different from described in the manual http://ocrsdk.com/documentation/specifications/regular-expressions/ .

For this text

FNNNNNNNNB or AFNNNNNNNNB

where

A=letters A thru M

F=letters A thru L

B=letters A thru Z (excluding letters "O" and "Z")

N=digits 0-9

you can use, for example, this regExp:

(|[A-M])[A-L][0-9]{8}([A-N]|[P-Y])
0

Please sign in to leave a comment.

Community

processTextField ignoring letterset and regexp params

Was this article helpful?

Comments

Didn't find what you were looking for?