Community

Deskewed pdfa output doesnot match with Corrected XML

Written by Permanently deleted user

December 04, 2018 07:29
7

Hi,

I'm trying to get xml(xmlForCorrectedImage format) and pdfa(corrected output format) for the source image. Eventhough the page width, height matches with (pages in) pdfa output and corrected xml, the coordinates are not exact.

Am i doing it wrong somewhere?

This are the parameters i'm using,

"language=%s&profile=%s&imageSource=%s&exportFormat=%s&xml:writeFormatting=%s",
language, "textExtraction","auto","txt,xmlForCorrectedImage,pdfA","true");

Thanks,

Vishnu

Was this article helpful?

0 out of 0 found this helpful

Comments

7 comments

Helen Osetrova

December 26, 2018 15:53
Hi,

The possible reason for differences in text coordinates could be an automatic skew correction. In order to disable it, kindly set the correctSkew parameter of the processImage method to "false". Please note that if the image is actually skewed, the recognition quality might be unsatisfying.

In addition, setting the imageSource parameter to the "scanner" value might be helpful. In this mode, Cloud OCR SDK does not correct possible image distortions and the coordinates remain the same.

For the more specific recommendations, kindly provide us with the source image.

0
Permanently deleted user

January 21, 2019 06:47
Hi helen,

Is there anyway that if i can get a deskewed image as an output pdf/image and its respective coordinates(exact) in correctedXml?

What i mean is, parameter correctSkew should be default(true) and imageSource as auto. Now if i process a skewed sample, i should get its deskewed pdf/image and its respective correctedXml as output and the coordinates should be exact with respect to the outputted deskewed pdf/image.

Thanks,

Vishnu

0
Permanently deleted user

February 12, 2019 05:54
Can someone please followup on this thread?

0
Aleksandra Zendrikova

February 14, 2019 12:20
Hi,

Could you kindly specify how exactly you compare coordinates between correctedXml and pdfA files?
Also, if you could provide us with the source image, it would be easier to find out the problem.

0
Permanently deleted user

February 14, 2019 13:34
Hi sasha,

I compared the pdfA file by importing it in an image editing tool(shows coordinates as we move pointer) with exact width and height as obtained in correctedXml page. Will send a sample source image in private.

Thanks,

Vishnu

0
Aleksandra Zendrikova

February 27, 2019 17:56
Hi,

Sorry for a long silence.

Your issue is not trivial and I have asked for some advice from our development team. I will tell you when they find out something.

Until that you can try to use images with higher resolution (tests showed, that problem occurred only on images with low resolution).

Also, you can use our FineReader Engine solution. For some reason, there is no problem with coordinates on your image when processing with FineReader Engine.

Unfortunately, that is all I can suggest for now.
Hope this will be useful.

0
Permanently deleted user

March 04, 2019 06:30
Your issue is not trivial and I have asked for some advice from our development team. I will tell you when they find out something.

Ok thanks.

0

Please sign in to leave a comment.

Community

Deskewed pdfa output doesnot match with Corrected XML

Was this article helpful?

Comments

Didn't find what you were looking for?