processTextField Method

The method allows you to extract the value of a text field on an image. The method loads the image, creates a processing task for the image with the specified parameters, and passes the task for processing.

Customize the following request URL according to your application processing location:

[POST] https://<PROCESSING_LOCATION_ID>.ocrsdk.com/v2/processTextField

The image file is transmitted in the request body. See the list of supported input formats.

fieldLevelRecognition profile is used for processing.

The result of recognition is returned in XML format.

See How to Recognize Text Fields to know how to tune the parameters.

Parameters

Parameter Is required Default value Description
image No - The file that contains the region that needs to be recognized
region No "-1,-1,-1,-1" Specifies the region of the text field on the image. The coordinates of the region are measured in pixels relative to the left top corner of the image and are specified in the following order: left, top, right, bottom. By default, the region of the whole image is used.
language No "English" Specifies recognition language of the document. This parameter can contain several language names separated with commas, for example "English,French,German". See the list of available recognition languages. Note that not all languages are available for handprint recognition. The languages which are available for handprint recognition are marked with a special comment.
letterSet No "" Specifies the letter set, which should be used during recognition. Contains a string with the letter set characters. For example, "ABCDabcd'-.". By default, the letter set of the language, specified in the language parameter, is used.
regExp No ""

Specifies the regular expression which defines which words are allowed in the field and which are not. See the description of regular expressions. By default, the set of allowed words is defined by the dictionary of the language, specified in the language parameter.

Note that regular expressions do not strictly limit the set of characters of the output result, i.e. the recognized value may contain characters which are not included into the regular expression. During recognition all hypotheses of a word recognition are checked against the specified regular expression. If a given recognition variant conforms to the expression, it has higher probability of being selected as final recognition output. But if there is no variant that matches regular expression, the result will not conform to the expression. If you want to limit the set of characters, which can be recognized, the best way to do it is to use letterSet parameter.

textType No "normal" Specifies the type of the text in the field. This parameter may also contain several text types separated with commas, for example "normal,matrix". The following values can be used:
  • normal
  • typewriter
  • matrix
  • index
  • handprinted
  • ocrA
  • ocrB
  • e13b
  • cmc7
  • gothic
oneTextLine No "false" Specifies whether the field contains only one text line. The value should be true, if there is one text line in the field; otherwise it should be false.
oneWordPerTextLine No "false" Specifies whether the field contains only one word in each text line. The value should be true, if no text line contains more than one word (so the lines of text will be recognized as a single word); otherwise it should be false.
markingType No "simpleText" This property is valid only for the handprint recognition. Specifies the type of marking around letters (for example, underline, frame, box, etc.). By default, there is no marking around letters. The value can be one of the following:
  • simpleText
  • underlinedText
  • textInFrame
  • greyBoxes
  • charBoxSeries
  • simpleComb
  • combInFrame
  • partitionedFrame
Note: For correct handprint recognition specify the value of the placeholdersCount parameter.
placeholdersCount No "1" Specifies the number of character cells for the field.

This property has a sense only for the field marking types (the markingType parameter) that imply splitting the text in cells.

Default value for this property is 1, but you should set the appropriate value to recognize the text correctly.

writingStyle No "default" Provides additional information about handprinted letters writing style. It can be one of the following:
  • default
  • american
  • german
  • russian
  • polish
  • thai
  • japanese
  • arabic
  • baltic
  • british
  • bulgarian
  • canadian
  • czech
  • croatian
  • french
  • greek
  • hungarian
  • italian
  • romanian
  • slovak
  • spanish
  • turkish
  • ukrainian
  • common
  • chinese
  • azerbaijan
  • kazakh
  • kirgiz
  • latvian
description No "" Contains the description of the processing task. Must contain no more that 255 characters.
pdfPassword No "" Contains a password for accessing password-protected images in PDF format.

Status codes and response format

General status codes and response format of the method are described in HTTP Status Codes and Response Formats.

Output file format

The output XML file has the following format:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<document xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0">
    <field left="0" top="0" right="199" bottom="100" type="text">
        <value encoding="UTF-16">Data Capture Sample Text Data</value>
        <line left="0" top="0" right="199" bottom="100">
            <char left="0" top="0" right="199" bottom="100" confidence="98">
            D
            </char>
            ...
        </line>
        ...
    </field>
</document>

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.