OCR quality of PDF file on Linux is not as good as on Windows

Usually OCR quality on Windows and Linux are completely equal. However, certain PDF files may be recognized with different quality on systems Linux vs Windows. The reason is that our OCR technology rasterizes PDF documents before recognition. This is done with the help of a 3rd party component (PDF library), which may not always select proper fonts. When the PDF library selects not-the-best font, OCR quality can be decreased. To avoid this, you can rasterize your PDF manually by converting it to TIFF. This can be done with the help of a standard Linux utility, Ghost Script, for example, as follows:

gs -dNOPAUSE -q -r300 -sDEVICE=tiffgray -sCompression=lzw -dBATCH -sOutputFile=Result.tif Original.pdf

 

 

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.