Hyperthreading (HT) & OCR Scalability

 

Multicore CPUs

CPUs contains more than one processing unit / CPU core to execute more tasks in parallel. Modern operating systems can distribute the amount of running processes/tasks/threads more efficiently on multiple CPU cores. Even “virtual” CPU cores can increase the throughput/speed.

What is Hyper-threading

Hyper-threading (HT) is a technology for CPU cores to improve parallelization of computations (execution of multiple tasks at the same time) performed on PC microprocessors.

See more: Hyper-threading

  • A CPU with only one physical CPU-core executes an OCR or text analytics task, the PC task manger displays one CPU-core with a high load:

1core-illu.png

  • A CPU with one working CPU supporting hyper-threading, the PC task manger displays 2 logical CPU-cores - each core with only 50% load:

2core-illu.png

Technically, an HT-CPU can work more efficiently, especially when multiple applications are running, (e.g. Windows, Outlook, Office applications and browsers running). Most of the “normal” applications/processes do not need too much CPU power - this is why a hyper-threading CPU can make switching between the different tasks more efficient.

From a user point of view, the system with Hypterthreading support reacts faster than a single core CPU (without HT). Several benchmark studies proof a speed increase =>, this is why hyperthreading is supported by most CPU types.

High Load Processes and HT

When running optical character recognition, hypertreading can make a real difference in processing speed (OCR = a very CPU intense process).

  • 1 OCR process can use up to almost 100% of a physical CPU core capacity.
    ⇒  the efficiency of the second “logical” HT-core can not “deliver” another 80-90% as it would be the case when running more 'less CPU-hungry' applications in parallel.
  • Important: A simple (arithmetic) average calculation might deliver wrong impression.
    • When all the physically provided processing power is used by one or multiple (OCR) processes, the task- manager will show almost a 100% load.
    • If you now double the number of cores by enabling/using hyperthreading, the almost 100% load appears only as a 50% load because of the average calculation. However, this is not reflecting the reality.

If we should use an example from real life, Hyperthreading can be compared to putting a spoiler on a car - in certain driving situations it might improve the experience, but the actual engine have not received more horsepower to go faster.

Influence of Hyperthreading

  • Hyperthreading CPUs will, therefore, influence the performance of computers when running 'standard' applications like Outlook, Browser, Office, etc. Here the user can often experience almost doubling the performance.
  • Hyperthreading CPUs do not have such a strong effect on the speed when it comes to CPU intense tasks, like OCR processing. Here the influence will be maximum between 20-30% 1).

Hyperthreading in ABBYY SDKs

ABBYY FineReader Engine and FlexiCapture SDK come with code samples that show how to use multiple CPU cores.

FineReader Engine Processing Pool

A simple test made with FineReader Engine 11 Release 5 on a Laptop (2012) Quad i7-3720QM, 2,6 GHz, Windows 7, 16 GB RAM, 64 bit; 2).

Threads/Processes running in the background 1 2 3 4 5 6 7 8
Throughput, pages per minutes 11 22 26 32 37 37 36 29

Results:

  • More OCR processes increase the throughput as expected.
  • The maximum page throughput is achieved when the number of processes is “number of physical CPU cores + 1
    Here it is: 4 physical cores + 1 additional process = 5
  • If the computer containes more cores, you might see an increase of pages when starting even 2-3 more processes, but the final result also depends on the document size and the OCR scenario you perform.

Screenshots for 1,4,5 and 8 OCR Processes

fre-processingpool-1process.png

fre-processingpool-4-processes.png

fre-processingpool-5-processes.pngfre-processingpool-5-processes.png

fre-processingpool-8-processes.png

  • If you have FineReader Engine installed, you easily can reproduce the test with the sample 'FineReader Engines Pool - Multithreading Sample (Windows)'.
  • The sample 'Multi-Processing Recognition - Code Sample (Windows)' lets you test multiple core scalability without the process-pool.

 

____

1) Depends on the CPU type, the documents processed, and the task
2) Absolute numbers might be different on other machines, the purpose here is only to show the influence of higher numbers of processes on a hyperthreading CPU.

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.