OCR Processing Speed

The overall speed of the document recognition process is a very important topic for many users, especially when high volumes of scanned (or photographed pages) need to be converted using Optical Character Recognition.

What influences the speed during an OCR process?

  • Optical Character Recognition is a multi-step process. Each of the processing step can require significant CPU power (e.g. image pre-processing or layout analysis)
  • Processing a high number of images and PDFs might require fast hard disk drive throughput
  • The processing speed depends on the the 'input material', for example the document type, image quality, languages contained in the document...etc. Each of the mentioned attributes will influence the document processing speed.
  • OCR quality and processing speed are proportional. The general rule is:
    • Recognition quality is directly proportional to the required processing time (in other words, fast recognition might deliver low recognition results)
    • Processing of low quality documents will need more CPU time and will take more time than processing documents of high quality.
  •  

ocr-speed-accuracy-image-quality.png

Rules of thumb

  • The better the image quality, the faster the images can be processed
    • Starting with the version 10, ABBYY FineReader Engine offers a new fast mode that is especially tuned for good quality images
    • If the image quality is not known in advance, it is recommended to use the “balanced mode”, here the technology makes its “internal” decision.
  • When the image quality is low, a siginificantly higher CPU power has to be “invested” to get best possible results as many steps need to be conducted
  • Complex layouts need more time for document analysis that rather simple book pages
  • Reading “low quality” characters takes more time than processing “clean” characters.

Time & Throughput

  • The total processing time is a sum of the different internal processing steps
  • In addition, usage of different technologies will provide different results in terms of processing times (different ABBYY technology cycles or when comparing technologies from different vendors).
  • In any case, receiving good recognition results on low quality images is only possible with a set of efforts that are needed for good and usable results
  • If a technology is only tuned for speed, then it will not be able to deliver acceptable results on low quality documents (additional time will be needed to verify and correct the results)
  • To scale up the throughput, an effective use of multiple cores has to be considered.
  • The type of OCR processing also influences the processing time/throughput:

 

Scalability

ABBYY SDKs offer high scalability. There are several approaches to increase the processing speed.

fre10_sample_batch_ocr-4cores.png

 

 

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.

Recently viewed