[Cloud] Abbyy OCR using Python Answered

Hi there.

I'm using the wrapper created by this guy: and the reason why is because your python on github is for Python2.7 and I'm using Python3.5.

Anyhow. I just want to grab the text and barcodes and get it back in XML. I keep getting back (even though I change my profiles) a full XML with stuff I don't need or want.

Do I have to parse through all the blocks, par, line, charParams?

Isn't there just a XML format like: < text> The OCR read text < /text> < barcode> value of a barcode < /barcode>?

I thought by changing my profile to documentArchiving or textExtraction it would give me something like that.

I don't care about the structure of the document. I just want ALL the text it can find and potentially any barcodes.

Thanks, Marcus




  • Avatar
    Oksana Serdyuk

    As I have already answered by email, ABBYY Cloud OCR SDK supports only this XML export format, and at the moment there are no any plans to add a new variant of XML export. You can create your own file in the needed format using our XML output.

  • Avatar

    Thanks for the answer Oksana. I have parsed your XML output to my own now and my biggest concern was that you will change the current XML format you have to something else. Because if you would (say remove the <par> tags) my parser will break.


Please sign in to leave a comment.