processDocument Method

The method starts the processing task with the specified parameters.

Customize the following request URL according to your application processing location:

[POST] https://<PROCESSING_LOCATION_ID>.ocrsdk.com/v2/processDocument

This method allows you to process several images using the same settings and obtain recognition result as a multi-page document. You can upload several images to one task using submitImage method.

It is also possible to specify up to three file formats for the result, in which case the server response for the completed task will contain several result URLs.

Only the task with Submitted, Completed or NotEnoughCredits status can be started using this method.

Parameters

 
Parameter Is required Default value Description
taskId Yes No Specifies the identifier of the task. If the task with the specified identifier does not exist or has been deleted, an error is returned.
language No "English" Specifies recognition language of the document. This parameter can contain several language names separated with commas, for example "English,French,German". See the list of available recognition languages.
profile No "document­Conversion" Specifies a profile with predefined processing settings. It can be one of the following:
  • documentConversion
  • documentArchiving
  • textExtraction
  • barcodeRecognition
textType No "normal" Specifies the type of the text in the document. This parameter may also contain several text types separated with commas, for example "normal,matrix". The following values can be used:
  • normal
  • typewriter
  • matrix
  • index
  • ocrA
  • ocrB
  • e13b
  • cmc7
  • gothic
imageSource No "auto"

Specifies the source of the image. It can be either a scanned image, or a photograph created with a digital camera. Special preprocessing operations can be performed with the image depending on the selected source. For example, the system can automatically correct distorted text lines, poor focus and lighting on photos.

The value of this parameter can be one of the following:

  • auto
    The image source is detected automatically.
  • photo
  • scanner
correctOrientation No "true" Specifies whether the orientation of the image should be automatically detected and corrected. It can have one of the following values:
  • true
    The page orientation is automatically detected, and if it differs from normal the image is rotated.
  • false
    The page orientation detection and correction is not performed.
correctSkew No "true" Specifies whether the skew of the image should be automatically detected and corrected. It can have either true or false value.
readBarcodes No "true" for xml export format and "false" in other cases Specifies whether barcodes must be detected on the image, recognized and exported to the result file. It can have either true or false value.
exportFormat No "rtf" Specifies the export format. This parameter can contain up to three export formats, separated with commas (example: "pdfa,txt,xml"). The available formats are:
  • txt
    The recognized text is exported to the file line by line from left to right. E.g. if the text was originally put in columns, the first lines of every column will be saved, then the second lines, etc.
    Please take into account the fact that in this format only text will be saved. No images or barcodes will remain in the output file. If you want to save the barcode recognition results in the exported file, use the txtUnstructured format.
  • txtUnstructured
    The exported file contains the text that was saved according to the order of the original blocks. This format can be tuned with the txtUnstructured:paragraphAsOneLine parameter.
  • docx
  • xlsx
  • pptx
  • pdfSearchable
    The entire image is saved as a picture, the recognized text is put under it.
  • pdfTextAndImages
    The recognized text is saved as text, and the pictures are saved as pictures.
  • pdfa
    The file is saved in the PDF/A-1b format, with the entire image saved as a picture, and recognized text put under it.
  • xml
  • xmlForCorrectedImage
    The same as xml, but all coordinates written into the output XML file relate to the corrected image, not the original.
  • alto

If either of XML export formats is selected, barcodes are recognized on the image and saved to output XML no matter which profile is used for recognition.

Please note that setting multiple export formats does not affect the cost of task processing.

xml:writeFormatting No "false" Specifies whether the paragraph and character styles should be written to an output file in XML format. This parameter can be used only if the exportFormat parameter contains xml or xmlForCorrectedImagevalue. The parameter can have one of the following values:
  • true
  • false
xml:writeRecognitionVariants No "false" Specifies whether the variants of characters recognition should be written to an output file in XML format. This parameter can be used only if the exportFormat parameter contains xmlvalue. The parameter can have one of the following values:
  • true
  • false
xml: writeWordRecognitionVariants No "false" Specifies if collections of variants of words recognition are to be written in a file in XML format. This parameter can be used only if the exportFormat parameter contains xml or xmlForCorrectedImage value. The parameter can have one of the following values:
  • true
  • false
pdf:writeTags No "auto"

Specifies whether the result must be written as tagged PDF. This parameter can be used only if the exportFormat parameter contains one of the values for export to PDF. It can have one of the following values:

  • auto
    Automatic selection: the tags are written into the output PDF file if it must comply with PDF/A-1a standard, and are not written otherwise.
  • write
  • dontWrite
description No "" Contains the description of the processing task. Cannot contain more than 255 characters.
txtUnstructured: paragraphAsOneLine No "false" Specifies if each paragraph in the recognized text is exported as one line. The parameter can have one of the following values:
  • true
    The whole text of the paragraph is exported as one line, line breaks are removed.
  • false
    Line breaks between lines of the paragraph are kept as in the original document.

Status codes and response format

General status codes and response format of the method are described in HTTP Status Codes and Response Formats.

 

Have more questions? Submit a request

Comments

1 comment

  • Avatar

    Ben Meddeb Lotfi

    Hi, how to have a Basic-XML (Text on One Line) with API  

    0

Please sign in to leave a comment.