I'm trying to use Cloud OCR SDK to convert PDF file to text in order to be able to have a structural HTML instead of XML which contains (almost) a tag/position information per character.
I'd like to ask that : - Do you know a XLST map that we can use to convert XML to HTML ? - Does Abbyy have any intention to provide such direct feature in the near future ?
It is possible to make a support of the HTML export format without pictures.
To make a solution, our analyst has asked for the following information:
We are not interesting in the image part of the PDFs such as backgrounds, logos, footers, separators. The important part for us is the text parts which we can use text-based information extractions.
is there any progress or any development that you can share on this subject ?
The analyst said that HTML export format should be added, but it will take some time, so he recommends to use the following workaround:
Hello. Can you please tell me - is PDF to HTML conversion implemented for now? If so - can you point me to documentation, samples or any other info that will help me to make such conversion?
"convert the pdf with the recognized text to HTML as it is described in this post." But it is saying that it's impossible.
Any progress in this issue?
You can convert PDF TextAndImages to HTML5 by means of PDF to HTML5 Converter.
Is this method appropriate for you?
Hello. Thank you for responce. So, exactly Abbyy do not have such service, am I right?
Unfortunately at the moment we don't have such functionality.
Please create a feature request and describe your scenario there. Do you need to save formatting, pictures?
I have created a feature request for HTML export. Please vote there. Hope this functionality will be added in the future.
Thank you, I didn't found - how to vote there, I just placed new comment, hope this will help.
Please sign in to leave a comment.