Parallel processing in FlexiCapture

Question

How are workloads distributed between Processing Stations? Can a batch be split so that it can be processed on multiple Stations and CPU cores? Is it possible to split tasks? 

Answer

Work is distributed between Processing Stations using tasks. A task can not be processed on several FlexiCapture stations. However, if a station has multiple CPU cores, it can process several tasks at the same time.

Some batches in FlexiCapture can be processed on multiple cores. Whether or not a batch can be processed on more than one core depends on properties of the project and of the batch type. Typically it is set up to process all the batch documents in a single task, so the the whole batch is processed by one core.

The workload is distributed between stations and then between processor cores differently in different FlexiCapture 12 installation options.

FlexiCapture Standalone
In FlexiCapture Standalone, import, recognition and export tasks are performed by the station on which they were started by the user.
Import and Export tasks are always processed by a single processor core.
Recognition of a batch of documents can be processed in parallel on several CPU cores of a Station if the project or batch type does not contain published multi-page FlexiLayouts, and if the batch is split into separate documents during import (either on Scanning Station or according to Image Import Profile settings). In this case the batch is divided into sets of about 10 documents each (if it has already been split into documents, each set may contain a little more or less than 10 documents, but always a whole number of documents). These sets are distributed between different cores.
If the project or batch type contains published multi-page FlexiLayouts but the batch is not split into individual documents, it cannot be processed on multiple cores in parallel and will be processed by a single core of the Station.

FlexiCapture Distributed

Task allocation is handled differently in the Distributed version of FlexiCapture: import, recognition and export tasks are allocated to Processing Stations by the Processing Server. When allocating tasks, the Processing Server takes the following factors into account:

  1. The amount of available processing cores on the Station.
    The Processing Server also determines which resources can be used to process tasks and uses this data to select the most appropriate Station for each task.
  2. Data in the Station's cache.
    The Processing Server checks if the batch is stored in the Station's cache. If it is, the station is far more likely to be selected for processing this task.
    In FlexiCapture if there are several automatic stages in the workflow that Processing Station can process consequently, it does it without additional communication with Applications Server which reduces the number of times data is needlessly transferred between Application Server and Processing Stations. The feature is called "stages merge".
  3. Whether it is possible to process a task on a given Processing Station.
    The Processing Server has a load-balancing mechanism that determines which stations are unsuitable for processing a given task.

When a new task is created, the Processing Server evaluates all of these factors and allocates the task to a Station. If this task is an export to PDF with text layer or recognition task, it is divided between the station's free processing cores.

Please note that when a task that is usually automated in FlexiCapture Distributed (this can be an import, recognition or export task) is launched manually on an interactive station (such as a Verification Station or a Project Setup Station), it will be processed on that station only, just like in FlexiCapture Standalone. In this case only 2 processor cores can be used for processing the task.

Import tasks are always processed by a single core, like in FlexiCapture Standalone.

The same goes for Export tasks, with one exception: exporting to a PDF file with a text layer. In course of PDF files creation for the export documents undergo preprocessing before exported. This process is actually a separate subtask and it can utilize multiple cores if they are available. The actual transfer of files, however, is handled by a single processing core.

Recognition of batches in FlexiCapture Distributed can be carried out on multiple processor cores of a Processing Station, like in the Standalone version. This will happen when:

  1. The batch is split into documents during the import stage.
  2. If automatic document assembly is disabled for the batch (the batch has the IsExcludedFromAutomaticAssembling flag).
  3. If the batch is not split into documents, but the Batch Type does not contain multi-page FlexiLayouts.

If one of these 3 conditions is met, the batch will be divided into parts containing about 10 documents each (if it has already been split into documents, each set may contain a little more or less than 10 documents, but always a whole number of documents). These sets are then allocated to the the Station's free cores.

In all other cases a batch will be processed on a single core of a Processing Station.

 

Example.

The following example illustrates how workload is distributed between Processing Stations and Cores.

Let us say there are 2 Processing Stations with 4 processing cores each, but the number of available cores on the 1st station was set to use only 3 cores. E.g.: Station 1 - 3 CPU cores, Station 2 - 4 CPU cores.

There are 3 batches in the Processing Server's queue: batch A, batch B and batch C. Batches A and C belong to batch type 1, which does not contain multi-page FlexiLayouts. Batch B is of type 2, which contains these FlexiLayouts. The Stations' cache does not contain any data about the batches.

The batches will be processed as illustrated below.

Parallel.png

1)     Batch A (28 pages, no multi-page FlexiLayouts):

  • The Processing Server determines which station has the most cores available. Station 1 has 3 available cores, station 2 has 4, so station 2 is selected by the Processing Server.
  • The program checks if there are multi-page FlexiLayouts for batch A's type. Since there aren't, batch A is split into parts, each of which is processed on one of Station 2's CPU cores.
  • Each core receives a 10-page part of the batch, so three cores on Station 2 are occupied by batch A.

2)     Batch В (16 pages, multi-page FlexiLayouts are available)

  • There are 3 cores available on Station 1 and only 1 core available on Station 2, so the Processing Server selects station 2.
  • There are multi-page FlexiLayouts available for batch B's type, so batch be will be processed on a single core of Station 1. This leaves 2 free cores on Station 1.

3)     Batch С (14 pages, no multi-page FlexiLayouts)

  • The Processing Server selects Station 1, since it has free cores while Station 2 has 1.
  • Since there are no multi-page FlexiLayouts available for batch C's type, the task can be processed on Station 1's cores in parallel.
  • Each core receives a 10-page part of the batch, so both of the station's remaining cores are occupied.

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.