Community

Deskewed pdfa output doesnot match with Corrected XML

Hi,

I'm trying to get xml(xmlForCorrectedImage format) and pdfa(corrected output format) for the source image. Eventhough the page width, height matches with (pages in) pdfa output and corrected xml, the coordinates are not exact.

Am i doing it wrong somewhere?

This are the parameters i'm using,

  "language=%s&profile=%s&imageSource=%s&exportFormat=%s&xml:writeFormatting=%s",
          language, "textExtraction","auto","txt,xmlForCorrectedImage,pdfA","true");

 

Thanks,

Vishnu

0

Comments

7 comments

  • Avatar
    Helen Osetrova

    Hi,

     

    The possible reason for differences in text coordinates could be an automatic skew correction. In order to disable it, kindly set the correctSkew parameter of the processImage method to "false". Please note that if the image is actually skewed, the recognition quality might be unsatisfying.

     

    In addition, setting the imageSource parameter to the "scanner" value might be helpful. In this mode, Cloud OCR SDK does not correct possible image distortions and the coordinates remain the same.

     

    For the more specific recommendations, kindly provide us with the source image.

     

    0
    Comment actions Permalink
  • Avatar
    Vishnu Vardhan

    Hi helen,

     Is there anyway that if i can get a deskewed image as an output pdf/image and its respective coordinates(exact) in correctedXml?

    What i mean is, parameter correctSkew should be default(true) and imageSource as auto. Now if i process a skewed sample, i should get its deskewed pdf/image and its respective correctedXml as output and the coordinates should be exact with respect to the outputted deskewed pdf/image.

     

    Thanks,

    Vishnu

    0
    Comment actions Permalink
  • Avatar
    Vishnu Vardhan

     Can someone please followup on this thread?

    0
    Comment actions Permalink
  • Avatar
    Aleksandra Zendrikova

     Hi,

    Could you kindly specify how exactly you compare coordinates between correctedXml and pdfA files?
    Also, if you could provide us with the source image, it would be easier to find out the problem.

    0
    Comment actions Permalink
  • Avatar
    Vishnu Vardhan

    Hi sasha,

          I compared the pdfA file by importing it in an image editing tool(shows coordinates as we move pointer) with exact width and height as obtained in correctedXml page. Will send a sample source image in private.

    Thanks,

    Vishnu

    0
    Comment actions Permalink
  • Avatar
    Aleksandra Zendrikova

    Hi,

    Sorry for a long silence.

    Your issue is not trivial and I have asked for some advice from our development team. I will tell you when they find out something. 

    Until that you can try to use images with higher resolution (tests showed, that problem occurred only on images with low resolution).  

    Also, you can use our FineReader Engine solution. For some reason, there is no problem with coordinates on your image when processing with FineReader Engine.  

    Unfortunately, that is all I can suggest for now.
    Hope this will be useful.

    0
    Comment actions Permalink
  • Avatar
    Vishnu Vardhan

    Your issue is not trivial and I have asked for some advice from our development team. I will tell you when they find out something. 

    Ok thanks.

    0
    Comment actions Permalink

Please sign in to leave a comment.