Community

[ProcessTextField] OCR Regex doesn't work

Hello,

Currently, we use API Process Text Fields in Cloud OCR API to recognize our application form. I defined some templates setting and region to OCR, but the result returned from API seem to be doesn't match with my Regex in the templates. Below is my setting, please take a look and help us.

Thanks in advance for your help!

Ex: 1. Settings : <text id="phone"> <language>English</language> <letterset>0123456789</letterset> <regexp>([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])</regexp> <texttype>handprinted</texttype> <placeholderscount>11</placeholderscount> <markingtype>partitionedFrame</markingtype> <onetextline>true</onetextline> <onewordpertextline>true</onewordpertextline> </text> <text id="phone"> <language>English</language> <letterset>0123456789</letterset> <regexp>([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])</regexp> <texttype>handprinted</texttype> <placeholderscount>11</placeholderscount> <markingtype>partitionedFrame</markingtype> <onetextline>true</onetextline> <onewordpertextline>true</onewordpertextline> </text>

  1. Results: Phone: 12832537427212 (Exptected: 8325342122) Date : 4217212056/ (Expected: 12/21/2016)

Was this article helpful?

0 out of 0 found this helpful

Comments

8 comments

  • Avatar
    Oksana Serdyuk

    Please share your image, the used processing settings and your Application ID. Kindly send this info to CloudOCRSDK@abbyy.com.

    0
  • Avatar
    Dang Vinh

    Hi Oksana Serdyuk, I already sent email to you guys. Thank you!

    0
  • Avatar
    Oksana Serdyuk

    Hi, I have received your message. Your settings are fine, I have reproduced the issue and now I am consulting with the developers. I will let you know as soon as I get their answer.

    0
  • Avatar
    Oksana Serdyuk

    Could you please explain how critical this issue is for you?

    Also please specify what volumes you plan to process using ABBYY Cloud OCR SDK?

    What is your usage scenario?

    0
  • Avatar
    Dang Vinh

    Hi Oksana Serdyuk, Sorry for late get back, We developed a system for our client. So this is our LIVE product. Please support us to get it done asap.

    Here is our purchased history: "Volume Pack L (5000 pages) for Application TLS-Enrollment 14 Nov 2016 42686-00003 $199.99"

    Thanks in advance!

    0
  • Avatar
    Oksana Serdyuk

    Hi, I am consulting with the developers regarding this issue now. I will let you know about the progress.

    0
  • Avatar
    Oksana Serdyuk

    Please sorry for the delay. Our team has investigated the issue and concluded that there is no bug, this behavior is due to the peculiarities of our recognition technology.

    Note that the regular expressions and the placeholdersCount parameter do not strictly limit the set of characters of the output result, i.e. the recognized value may contain characters which are not included into the regular expression and they can be more or less then you specified in placeholdersCount. These parameters are necessary for more accurate detection and recognition of the text field.

    In this particular case the issue is connected with the fact that during binarization the field markup is destroyed and therefore it is not defined properly. So, you can find that the recognized value contains more characters, and the most of extra characters are "1" (the borders of markup is recognized as "1" if it was not properly deleted).

    The image after binarization is the following:

    alt text

    However, our developers recommend to try to increase the brightness during scanning to make the image brighter.

    Also it is recommended to set the field region most closely. For example, if we process the "credit_card_number" text field with the following settings:

    ...
      <fieldTemplates>
        <text id="credit_card_number" bottom="0" left="0" right="0" top="0">
          <language>Digits</language>
          <letterSet>0123456789</letterSet>
          <textType>handprinted</textType>
          <oneTextLine>true</oneTextLine>
          <oneWordPerTextLine>true</oneWordPerTextLine>
          <markingType>partitionedFrame</markingType>
          <placeholdersCount>16</placeholdersCount>
        </text>
      </fieldTemplates>
      <page applyTo="0">
        <!--Credit Card-->
        <text id="credit_card_number" bottom="562" right="1361" top="493" left="72" template="credit_card_number"/>
        <!--End Credit Card-->
      </page>
    </document>
    

    alt text

    it is recognized accurately:

    <text bottom="562" right="1361" top="493" left="72" id="credit_card_number">
        <value>4373740000796405</value>
        <line bottom="551" right="1344" top="494" left="86">
            <char bottom="551" right="141" top="499" left="86">4</char>
            <char bottom="551" right="198" top="497" left="173">3</char>
            <char bottom="546" right="295" top="494" left="243" suspicious="true">7</char>
            <char bottom="551" right="374" top="501" left="336">3</char>
            <char bottom="545" right="476" top="502" left="415" suspicious="true">7</char>
            <char bottom="551" right="535" top="503" left="499">4</char>
            <char bottom="539" right="604" top="503" left="577">0</char>
            <char bottom="540" right="685" top="499" left="657">0</char>
            <char bottom="541" right="771" top="508" left="745">0</char>
            <char bottom="540" right="849" top="511" left="818">0</char>
            <char bottom="551" right="944" top="505" left="889" suspicious="true">7</char>
            <char bottom="550" right="1017" top="501" left="978">9</char>
            <char bottom="551" right="1092" top="506" left="1062">6</char>
            <char bottom="551" right="1190" top="511" left="1135">4</char>
            <char bottom="551" right="1254" top="509" left="1225">0</char>
            <char bottom="551" right="1344" top="507" left="1299">5</char>
        </line>
    </text>
    
    0
  • Avatar
    Dang Vinh

    Hi Oksana Serdyuk.Thanks for your help! I will work with team to try to improve image quality and fields setting.

    0

Please sign in to leave a comment.