Community

Converting PDF to TIF and improving recognize

Hi,

A customer has several PDF documents with multiple tables he wants to capture and export to excel.

So I converted the PDF document into a TIF image in order to use FlexiCapture to recognize and export the document.


The letters in the PDF document were small, then used the Ghostscript to convert the PDF in A4 format for a TIF image in A2 format, with 400dpi.


Although the image is with a great resolution and no noise, FlexiCapture program recognizes 97% of the characters.


The way I used to convert the PDF to image is right or is there a better way?

Thanks in advance,
Sergio Souza

Was this article helpful?

0 out of 0 found this helpful

Comments

2 comments

  • Avatar
    Alberto Torino
    Hi Sergio,
    Flexicapture is able to import PDF Files. Is there any reason your are converting them to Tif (have you tried importing the pdf files directly)?.
    What version of Flexicapture are you using?
    Can you provide sample images and your project?
    I'll be glad to help if I can.

    Regards,
    0
  • Avatar
    Permanently deleted user
    Sergio,

    As atorino mention, FlexiCapture natively supports PDF for image import. That's assuming it follows a standard PDF format. I've seen some program create PDF file without the header information which doesn't work in FlexiCapture.

    In any case, without sample image, we really can't say why you getting 97% accuracy. That's actually not bad at all. It could be something simple as a zero getting mistaken for an "O". Its a confidence level of the engine. If you want, you could lower the threshold in the verification tab of the field. This is assuming you are getting very good accuracy but its just uncertain. You could also play with the Data Type to try to narrow down the recognition engine's choices.
    0

Please sign in to leave a comment.