Community

processTextField ignoring letterset and regexp params

Hello,

I'm just getting started with OCR SDK, and I'm using PHP to send POST method with an image of a serial number, and I'm testing out letterSet and despite setting it as follows:

http://cloud.ocrsdk.com/processTextField?language=english&letterSet=0123456789*

I still get letters returned in the XML result set. I've also been testing the RegExp parameter, and that also seems to be ignored (returning letters where only numbers are specified in the RegExp). In this case, I am expanding letterSet to include all letters and numbers and adding this regexp parameter:

&regExp=[A-Z]?[A-Z][0-9]{4}

What I am trying to do is have OCR recognize a serial number in the format: A?ANNNN (A=letter, N=number) where only digits can appear in positions 3-6, and only a one or two letter prefix (A-Z).

I assume that the parameters for processTextField are sent in the URL string (GET) as opposed to sending with the POST along with the image?

I did see the post about using the "Digits" language, but my requirements are more than what is contained in that language.

Thanks.

0

Comments

7 comments

  • Avatar
    Anastasia Galimova

    To let us test it, could you please share or sent to CloudOCRSDK@abbyy,com the image you recognize?

    0
  • Avatar
    HankLloydRight

    I sent a detailed message to that email address. thanks.

    0
  • Avatar
    Anastasia Galimova

    Thank you. We have received your letter and will reply tomorrow.

    0
  • Avatar
    Anastasia Galimova

    The issue occurs because OCR technologies are not trained well for this font. It should be fixed in the future.

    We find our that both of your images could be completely recognized with the following URL: http://cloud.ocrsdk.com/processTextField?textType=handprinted

    Thank you for your patience!

    0
  • Avatar
    HankLloydRight

    Thanks for your reply.

    I had tried "handprinted" as well as all the other textType types during testing, but handprinted failed on many more of the other images I tested.

    I found that using "textType=normal,typewriter" generated the smallest number of OCR errors for my images. Really, the only one image that failed with "textType=normal,typewriter" was the one I emailed you.

    Can you explain how the RegExp parameter works, since Abbyy still returns values that would not pass the RegExp I'm using.

    In the mean time, I'll just write some code on my end to detect the mis-reads that violate the RegExp values, and try to correct them before passing to my application.

    Thanks again.

    0
  • Avatar
    Anastasia Galimova

    We have found two bugs, that should be fixed in the nearest feature and could be avoided now:

    1. Regular expression does not works when the language is specified directly. We recommend do not specify the language in the URL (letterset and regExp are enough).

    2. It is something wrong with asterisk in the letterset: when it is used with handprinted text type, an error occurs. If all of your expressions contains an asterisk in the end, probably you can recognize only the text before it.

    0
  • Avatar
    Anastasia Galimova

    Also the syntax you use is slightly different from described in the manual http://ocrsdk.com/documentation/specifications/regular-expressions/ .

    For this text


    FNNNNNNNNB or AFNNNNNNNNB

    where

    • A=letters A thru M
    • F=letters A thru L
    • B=letters A thru Z (excluding letters "O" and "Z")
    • N=digits 0-9

    you can use, for example, this regExp:

    (|[A-M])[A-L][0-9]{8}([A-N]|[P-Y])

    0

Please sign in to leave a comment.