About ABBYY OCR Container
ABBYY OCR Container is a linux-based Docker container that you can deploy in your container
orchestrator and use to convert documents. The container provides OCR service via REST API requests.
Getting started
To begin using ABBYY OCR Container, do the following:
- Log in to the Docker registry:
docker login abbyycontainers.azurecr.io --username <username> --password <password>
- Pull the image from the registry:
docker pull abbyycontainers.azurecr.io/ocr.container:1.0.0
- Run the container:
docker run -d -p 80:5000 abbyycontainers.azurecr.io/ocr.container:1.0.0
- Open the container page http://localhost:80 in your browser. On this page you will find the EULA
section where you can read the terms and conditions. To accept the EULA, set --env
EulaAccepted=true when deploying the container. Run the following command if you have
accepted the EULA:docker run --env EulaAccepted=true -d -p 80:5000
abbyycontainers.azurecr.io/ocr.container:1.0.0
Note: Without accepting the EULA, the API methods will not be available.
After running the container, you can open the http://<host>:<port> page (for example, http://localhost:80). On this page you will find links to the User Guide, the Swagger page with detailed API documentation, and container deployment examples.
Below is a list of application settings that you can change by redefining variables when deploying a
container:
Parameter | Description |
---|---|
ASPNETCORE_URLS |
Specifies the port in the container on which the By default, the application uses port 5000. If you |
EulaAccepted |
Specifies whether the EULA has been accepted. Set the value of the parameter to true if you have |
Kestrel__Limits__MaxRequestBodySize |
Specifies the maximum size of the request body in The request body size limit affects the maximum |
Licensing__WorkingDirectory | Specifies the folder where the license files will be stored. By default, the license is stored in the /app/bin/linux/License folder in the container. When running a container in production, to store a license, you need to mount a folder in the cluster storage, so that all container replicas can share the same license. When deploying a container, set the value of the parameter to the path to this folder, for example "/mnt/license_data". |
LicensingHqClient__Proxy__Use | Specifies whether a proxy will be used to send requests to the license server. The value of the parameter can be true or false. |
LicensingHqClient__Proxy__Uri | A string containing the proxy address, including the port. |
LicensingHqClient__Proxy__Domain | The domain where the user is located. |
LicensingHqClient__Proxy__UserName | The username to authenticate on the proxy server. |
LicensingHqClient__Proxy__Password | The user password to authenticate on the proxy server. |
RecognizeService__ProcessingTimeoutSeconds | Specifies the maximum time (specified in seconds) that an application can take to process an API request. By default, the value of the parameter is "1800" (30 minutes). If you process large multi-page documents, it can take a long time (perhaps up to several hours). In this case, you may need to increase the response timeout of the container and your infrastructure (ingress, load balancer, etc.). In the container, you can do it by changing the value of this parameter. |
Kubernetes Deployment Example
The example (K8s deployment sample link) shows deploying a container to a Kubernetes cluster using a
YML file.
To deploy a container, do the following:
- Install the kubectl utility on the machine that will be used to deploy the container.
- On the container web page, download the ocr-example.yml file and edit it according to your
environment settings. In the Deployment section, specify the name of the container image in the
spec/template/spec/containers/image parameter. In the Ingress section, specify the
spec/rules/host and spec/tls/hosts using the address where the container should be accessible.
Specify the name of your namespace in all namespace parameters. Change any other settings
appropriately if required. - Check that the cluster where you want to deploy the container is selected in the current context:
kubectl config get-contexts
- Deploy the container:
kubectl apply -f ocr-example.yml
Helm Chart Example
This example (Helm chart sample link) shows how to deploy a container to a Kubernetes cluster using a
helm chart.
To deploy a container, do the following:
- Install the helm utility on the machine that will be used to deploy the container.
- On the container web page, download the helm chart sample (ocr-container-0.1.0.tgz).
- Unpack ocr-container-0.1.0.tgz and ocr-container-0.1.0.tar.
- In the ocr-container-0.1.0 folder, open the values.yaml file in a text editor
- Fill in the parameters with values that meet your requirements and your environment. Save the
values.yaml file. - Check that the cluster where you want to deploy the container is selected in the current context:
kubectl config get-contexts
- Run the command below:
helm install name chart flags
with the following parameters:
Parameter | Description |
---|---|
name | Release name. |
chart | The path to the folder where the Chart.yaml and values.yaml files are located. |
flags | Specifies the flags if needed. |
Command example:
helm install ocr-container-release D:\ocr-container-0.1.0\ocr-container-0.1.0\ocrcontainer
-n ocr-test
Licensing
Using ABBYY OCR Container requires a license, which determines the following parameters:
- the number of pages that can be processed throughout the license period,
- the expiration date of the license,
- license update information,
- information on sending license usage statistics.
After you get the license file, you need to install it using the setLicense method.
REST API
ABBYY OCR Container REST API is described in detail on the Swagger page, where you can also try out
API calls and receive the results. You can also use the Swagger page to generate bits of code that can be
reused in your client application.
Swagger Codegen or a similar tool will help you generate client code in different programming languages. Use the specification at http://<host>:<port>/swagger/index.html.
License
pagecount
GET http://<host>:<port>/api/v1/License/pageCount
Counts pages that have been processed using a license.
Returns the number of processed pages in the response to the request.
rawstatistic
GET http://<host>:<port>/api/v1/License/rawStatistic
Returns a string for sending statistics to ABBYY. The method is used when the container works without
access to the Internet.
setLicense
GET http://<host>:<port>/api/v1/License/setLicense
Installs a license in a container. Has 1 required parameter:
- LicenseFile - the license file to be uploaded.
statistic
GET http://<host>:<port>/api/v1/License/statistic
Specifies page usage statistics.
status
GET http://<host>:<port>/api/v1/License/status
Specifies the container license status. The license can be active or inactive.
If the license is inactive, returns the following along with its status in the response to the request: a
description of the reason (blocked, expired, no license, damaged, etc.).
The response to the request contains:
- a serial number,
- the number of pages that can be processed,
- the number of remaining pages,
- the license expiration date.
updateLicense
GET http://<host>:<port>/api/v1/License/updateLicense
Updates the license manually. If a license is updated after it has expired, the information in the container
may not be updated immediately, it depends on how often the updates occur (the update period is set in
the license settings). You can use this method to have the container update its license from the ABBYY
server.
Recognize
process
POST http://<host>:<port>/api/v1/process
Works synchronously. Loads an image file, recognizes the text found on the image, and exports it to the
specified format.
Note that the service can only process one request at a time. If it's busy with another task, the request will
receive a 503 response code and won't be executed. You may want to implement a task queue on your
side to ensure that the images are sent one by one.
Note: The OCR Container supports processing of documents up to 3000 pages.
Has 5 required parameters:
- AutoCropImage - specifies whether the image should be cropped automatically: image borders are
detected and cropped, skewing and distortions are also fixed. The value of the parameter can be True or False. - AutoCorrectOrientation - specifies whether the image should be rotated automatically. The value of the parameter can be True or False.
- File - the file to be processed (image or PDF document).
- Format - the format of the file that will be returned. See a list of supported export formats and their
values to pass in the Export formats. The parameter is not case-sensitive. - Language - a recognition language. You can specify one or several languages using the internal names of the languages separated by commas. See a list of supported languages and the correct name for each in the Recognition languages table. The parameter is not case-sensitive.
Has 1 optional parameter:
- Barcodes - barcode types that will be recognized. See a list of supported barcode types and the
correct name for each in the Barcode types table. The parameter is not case-sensitive.
Health check
ABBYY OCR Container provides the HTTP endpoints that can be used to monitor the state of the
container.
Liveness
http://<host>:<port>/liveness
Informs you that the container is running.
Responses:
- 200 OK — Service Available (Healthy)
- 503 — Service Unavailable (Unhealthy)
Readiness
http://<host>:<port>/readiness
Informs you that the service is free and can accept a new processing request.
Responses:
- 200 OK — Service Available (Healthy)
- 503 — Service Unavailable (Unhealthy)
For Specifications, Third-party Software/Open Source Software, JSON schema and XML schema check the attachment.
Comments
0 comments
Please sign in to leave a comment.