Administrator's Guide

About ABBYY OCR Container

ABBYY OCR Container is a linux-based Docker container that you can deploy in your container
orchestrator and use to convert documents. The container provides OCR service via REST API requests.

Getting started

To begin using ABBYY OCR Container, do the following:

  1. Log in to the Docker registry:
    docker login abbyycontainers.azurecr.io --username <username> --password <password>
  2. Pull the image from the registry:
    docker pull abbyycontainers.azurecr.io/ocr.container:1.0.0
  3. Run the container:
    docker run -d -p 80:5000 abbyycontainers.azurecr.io/ocr.container:1.0.0
  4. Open the container page http://localhost:80 in your browser. On this page you will find the EULA
    section where you can read the terms and conditions. To accept the EULA, set --env
    EulaAccepted=true when deploying the container. Run the following command if you have
    accepted the EULA:
    docker run --env EulaAccepted=true -d -p 80:5000
    abbyycontainers.azurecr.io/ocr.container:1.0.0

Note: Without accepting the EULA, the API methods will not be available.

After running the container, you can open the http://<host>:<port> page (for example, http://localhost:80). On this page you will find links to the User Guide, the Swagger page with detailed API documentation, and container deployment examples.

Below is a list of application settings that you can change by redefining variables when deploying a
container:

Parameter Description
ASPNETCORE_URLS

Specifies the port in the container on which the
application is running.

By default, the application uses port 5000. If you
want to set a different port, for example port 80, set
the value of the parameter to "http://+:80".

EulaAccepted

Specifies whether the EULA has been accepted.

Set the value of the parameter to true if you have
read and accept the terms of the agreement

Kestrel__Limits__MaxRequestBodySize

Specifies the maximum size of the request body in
bytes.

The request body size limit affects the maximum
size of the input image file. The default value of the
parameter is 50 MB. If you want to change this
limit, for example increase it to 100 MB, then set
the value of the parameter to "104857600".

Licensing__WorkingDirectory Specifies the folder where the license files will be
stored.
By default, the license is stored in the /app/bin/linux/License folder in the container. When running a container in production, to store a license, you need to mount a folder in the cluster storage, so that all container replicas can share the same license. When deploying a container, set the value of the parameter to the path to this folder, for example "/mnt/license_data".
LicensingHqClient__Proxy__Use Specifies whether a proxy will be used to send
requests to the license server.
The value of the parameter can be true or false.
LicensingHqClient__Proxy__Uri A string containing the proxy address, including the port.
LicensingHqClient__Proxy__Domain The domain where the user is located.
LicensingHqClient__Proxy__UserName The username to authenticate on the proxy server.
LicensingHqClient__Proxy__Password The user password to authenticate on the proxy server.
RecognizeService__ProcessingTimeoutSeconds Specifies the maximum time (specified in seconds)
that an application can take to process an API
request.
By default, the value of the parameter is "1800" (30
minutes). If you process large multi-page
documents, it can take a long time (perhaps up to
several hours). In this case, you may need to
increase the response timeout of the container and
your infrastructure (ingress, load balancer, etc.). In
the container, you can do it by changing the value
of this parameter.

Kubernetes Deployment Example

The example (K8s deployment sample link) shows deploying a container to a Kubernetes cluster using a
YML file.

To deploy a container, do the following:

  1. Install the kubectl utility on the machine that will be used to deploy the container.
  2. On the container web page, download the ocr-example.yml file and edit it according to your
    environment settings. In the Deployment section, specify the name of the container image in the
    spec/template/spec/containers/image parameter. In the Ingress section, specify the
    spec/rules/host and spec/tls/hosts using the address where the container should be accessible.
    Specify the name of your namespace in all namespace parameters. Change any other settings
    appropriately if required.
  3. Check that the cluster where you want to deploy the container is selected in the current context:
    kubectl config get-contexts
  4. Deploy the container:
    kubectl apply -f ocr-example.yml

Helm Chart Example

This example (Helm chart sample link) shows how to deploy a container to a Kubernetes cluster using a
helm chart.

To deploy a container, do the following:

  1. Install the helm utility on the machine that will be used to deploy the container.
  2. On the container web page, download the helm chart sample (ocr-container-0.1.0.tgz).
  3. Unpack ocr-container-0.1.0.tgz and ocr-container-0.1.0.tar.
  4. In the ocr-container-0.1.0 folder, open the values.yaml file in a text editor
  5. Fill in the parameters with values that meet your requirements and your environment. Save the
    values.yaml file.
  6. Check that the cluster where you want to deploy the container is selected in the current context:
    kubectl config get-contexts
  7. Run the command below:
    helm install name chart flags
    with the following parameters:
Parameter Description
name Release name.
chart The path to the folder where the Chart.yaml and values.yaml files are located.
flags Specifies the flags if needed.

Command example:

helm install ocr-container-release D:\ocr-container-0.1.0\ocr-container-0.1.0\ocrcontainer
-n ocr-test

Licensing

Using ABBYY OCR Container requires a license, which determines the following parameters:

  • the number of pages that can be processed throughout the license period,
  • the expiration date of the license,
  • license update information,
  • information on sending license usage statistics.

After you get the license file, you need to install it using the setLicense method.

REST API

ABBYY OCR Container REST API is described in detail on the Swagger page, where you can also try out
API calls and receive the results. You can also use the Swagger page to generate bits of code that can be
reused in your client application.

Swagger Codegen or a similar tool will help you generate client code in different programming languages. Use the specification at http://<host>:<port>/swagger/index.html.

License

pagecount

GET http://<host>:<port>/api/v1/License/pageCount

Counts pages that have been processed using a license.
Returns the number of processed pages in the response to the request.

rawstatistic

GET http://<host>:<port>/api/v1/License/rawStatistic

Returns a string for sending statistics to ABBYY. The method is used when the container works without
access to the Internet.

setLicense

GET http://<host>:<port>/api/v1/License/setLicense

Installs a license in a container. Has 1 required parameter:

  • LicenseFile - the license file to be uploaded.

statistic

GET http://<host>:<port>/api/v1/License/statistic

Specifies page usage statistics.

status

GET http://<host>:<port>/api/v1/License/status

Specifies the container license status. The license can be active or inactive.

If the license is inactive, returns the following along with its status in the response to the request: a
description of the reason (blocked, expired, no license, damaged, etc.).

The response to the request contains:

  • a serial number,
  • the number of pages that can be processed,
  • the number of remaining pages,
  • the license expiration date.

updateLicense

GET http://<host>:<port>/api/v1/License/updateLicense

Updates the license manually. If a license is updated after it has expired, the information in the container
may not be updated immediately, it depends on how often the updates occur (the update period is set in
the license settings). You can use this method to have the container update its license from the ABBYY
server.

Recognize

process

POST http://<host>:<port>/api/v1/process

Works synchronously. Loads an image file, recognizes the text found on the image, and exports it to the
specified format.

Note that the service can only process one request at a time. If it's busy with another task, the request will
receive a 503 response code and won't be executed. You may want to implement a task queue on your
side to ensure that the images are sent one by one.

Note: The OCR Container supports processing of documents up to 3000 pages.

Has 5 required parameters:

  • AutoCropImage - specifies whether the image should be cropped automatically: image borders are
    detected and cropped, skewing and distortions are also fixed. The value of the parameter can be True or False.
  • AutoCorrectOrientation - specifies whether the image should be rotated automatically. The value of the parameter can be True or False.
  • File - the file to be processed (image or PDF document).
  • Format - the format of the file that will be returned. See a list of supported export formats and their
    values to pass in the Export formats. The parameter is not case-sensitive.
  • Language - a recognition language. You can specify one or several languages using the internal names of the languages separated by commas. See a list of supported languages and the correct name for each in the Recognition languages table. The parameter is not case-sensitive.

Has 1 optional parameter:

  • Barcodes - barcode types that will be recognized. See a list of supported barcode types and the
    correct name for each in the Barcode types table. The parameter is not case-sensitive.

Health check

ABBYY OCR Container provides the HTTP endpoints that can be used to monitor the state of the
container.

Liveness

http://<host>:<port>/liveness

Informs you that the container is running.

Responses:

  • 200 OK — Service Available (Healthy)
  • 503 — Service Unavailable (Unhealthy)

Readiness

http://<host>:<port>/readiness

Informs you that the service is free and can accept a new processing request.

Responses:

  • 200 OK — Service Available (Healthy)
  • 503 — Service Unavailable (Unhealthy)

For Specifications, Third-party Software/Open Source Software, JSON schema and XML schema check the attachment.

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.