How to use abbyy ocr sdk with python?

Hi All, 

I am new to Abbyy and python.

I need to convert an image PDF to text. I have tried a few samples.

Sample code below:

from ABBYY import CloudOCR
ocr_engine = CloudOCR(application_id='b9fc6c0*******', password='h********')
pdf = open("filename.pdf", 'rb')
file = { pdf}
result = ocr_engine.process_and_download(file, exportFormat='txt,pdfTextAndImages',language="ChinesePRC")

And I am getting the output as follows:

{'txt': <_io.BytesIO object at 0x03AEF270>, 'pdfTextAndImages': <_io.BytesIO object at 0x031ADCF0>}


Can someone please tell me how to wrap this? i.e. readable in python.

My use case is:

I have an image PDF (scanned), which I need to convert to text and do some string operations.

Currently, I am using the trial version and we are already having the corporate licence, which I will be using after getting a positive result.


Thanks a lot in advance> :) 




Please sign in to leave a comment.