OCR quality of PDF file on Linux is not as good as on Windows

Usually OCR quality on Windows and Linux are completely equal. However, certtain PDF files may be recognized with different quality on systems Linux vs Windows. The reason is that our OCR technology rasterizes PDF before recognition. This is done with the help of 3rd party component (PDF library), who not always selects proper fonts. And when the PDF library selects not-the-best font, OCR quality can be decreased. To avoid this, you can rasterize your PDF manually by converting it to TIFF. This can be done with the help of standard Linux utility, Ghost Script, for example, as follows:

gs -dNOPAUSE -q -r300 -sDEVICE=tiffgray -sCompression=lzw -dBATCH -sOutputFile=Result.tif Original.pdf

 

 

Was this article helpful?

5 out of 5 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.

Recently viewed