Part # 1443/1
Product # 1.0.0
Technologies # OCRT 16.2.723.48
Protection # 1.0.1.80
- Deployment and running
To work with ABBYY OCR Container, the user needs to download it from the ABBYY Container Registry. The link will be provided together with a trial license.
Before running the container, you must read and accept the terms of the EULA. Please refer to the documentation for more details on accepting the terms of the EULA.
When you first run the container, a demo page with links to the swagger UI and documentation will become available.
- Processing documents
The ABBYY OCR Container REST API is described in detail on the Swagger page, where you can also try out API calls, receive the results, and generate bits of code to reuse them in your client application.
For document processing, a Recognize method is used. This method works synchronously. It loads an image file, recognizes the text on the image, and exports it to a format of your choice. Please note that the service can only process one request at a time. If it's busy with another task, the request will receive a 503-response code and won't be executed.
The Recognize method has the following settings:
- Language. Required parameter. Specifies the set of languages to be used for OCR. Please refer to Help for the full list of supported languages.
- Please note that choosing too many languages may affect performance and quality.
- AutoCrop. Required parameter. We recommend to enable this parameter by default. If AutoCrop is enabled, all incoming documents will be automatically cropped where required for best OCR quality.
- AutoCorrectOrientation. Required parameter. This parameter is enabled by default. If AutoCorrectOrientation is enabled, orientation will be corrected where required.
- Format. Required parameter. Specifies the output format. Only one output format can be selected from the following:
- JSONTextOnly*
- JSONPreserveDocumentStructure*
- Txt
- DocxEditable
- DocxExact
- Xlsx
- PDF_A_3a
- PDF_A_3b
- PDFImageOnly
- XMLTextOnly*
- XMLPreserveDocumentStructure*
- Tiff
- Jpeg
- Jpeg2000
- Png
- Html
*For JSON and XML outputs:
Text Only. The exported file will only contain recognized text but the document layout will not be preserved. This mode is more efficient for structured documents, such as invoices and receipts, and is focused on extracting text blocks.
Preserve Document Structure. The exported file will contain recognized text and the document layout will be preserved. This mode is more efficient for documents without a predefined structure, such as contracts and agreements. It has been specifically designed to detect and preserve the structure of recognized documents.
- Barcodes. Optional parameter. A list of barcodes to be recognized. No barcodes are selected by default.
- File. Required parameter. File for processing
- Language. Required parameter. Specifies the set of languages to be used for OCR. Please refer to Help for the full list of supported languages.
- Licensing
You need a license to run ABBYY OCR Container.
After you get a license file, you need to install it using the setLicense method.
Two licensing options are available:
- Connected license. This type of license will periodically send usage statistics to ABBYY servers. Only information about page consumption within a period will be sent.
- Disconnected license. This type of license can only be obtained upon special request to ABBYY. No usage statistics will be sent automatically to ABBYY servers. To send usage statistics manually, special API calls should be used.
- Liveness and readiness checks
ABBYY OCR Container provides HTTP endpoints that can be used to monitor the state of the container.
Liveness: http://<host>:<port>/liveness. Informs you that the container is running. Responses:
- 200 OK — Service Available (Healthy)
- 503 — Service Unavailable (Unhealthy)
Readiness: http://<host>:<port>/readiness. Informs you that the service is free and can accept a new processing request. Responses:
- 200 OK — Service Available (Healthy)
- 503 — Service Unavailable (Unhealthy)
- Handwritten text is recognized only if JSON Text Only or XML Text Only is selected as the output format.
- Handwritten is supported for English, German, and French languages only.
- Processing big files (more than 100 pages) may require additional RAM and disk space (to store cached data).
- Only proxy server url could be used in container when using proxy settings from docker.config. Proxy user name and password should be specified as environmental variables in container.
- It is recommended to process files of no more than 3,000 pages with an ABBYY OCR Container
Note: In case of any questions, please get in touch with our Customer Support team.
Comments
0 comments
Please sign in to leave a comment.