Community

How to use API?

How to use this API?

https://www.ocrsdk.com/documentation/api-reference/process-image-method-v2/

[POST] https://<PROCESSING_LOCATION_ID>.ocrsdk.com/v2/processImage

I understand nothing from this tutorial. Suppose I want to connect to http://cloud-eu.ocrsdk.com/

So what URL will be? Where to enter ApplicationId and Password

Image should be send in body (post)?

 

EDIT:

I tried, tried and somewhere found that url is:

https://nnnnnn:xxxx@cloud-eu.ocrsdk.com/v2/processImage/?exportFormat=txt

As a result I get JSON but how to get result!?

I do not get andy url as written here:

https://www.ocrsdk.com/documentation/specifications/status-codes-v2/

Just some taskID. What should I do with that?

Why such simple things are so complicated?!

0

Comments

4 comments

  • Avatar
    Koen de Leijer

    Hi

    I was waiting for an answer for your previous question

    - https://forum.ocrsdk.com/thread/executing-in-python-stalls/

    until I saw these two new ones.

    - https://forum.ocrsdk.com/thread/python-subprocess-run-what-i-should-get-as-a-result/

    - https://forum.ocrsdk.com/thread/how-to-use-api/

     

    It should not be that hard after all with Python, please follow these steps:

    https://www.ocrsdk.com/documentation/quick-start-guide/python-ocr-sdk/

    Then tell me what output you get after step "5"


    Or have a look at another sample from ABBYY

    - https://pypi.org/project/ABBYY/

    - https://github.com/samueltc/ABBYY


    Best regards

    Koen de Leijer

     

    0
    Comment actions Permalink
  • Avatar
    Puchatekkubus

    Your answer is not to the point. I ask about http API.

    I started with Python but since this code is not handling errors I decided to use http protocol. Forgive me but your API tutorial is complete mess. As API user I always start from authorization (not even word about that in main tutorial). Then I'd like to know what each method returns and how to read it. Nothing about this. In next level tutorial I have info that 'processImage' method returns URL. Not true - there is no url in result JSON.

     

    BWT each post starts with 'Error creating post'. Have to click at least twice to post (sometimes page reload needed)

     

    0
    Comment actions Permalink
  • Avatar
    Puchatekkubus

    I send image in POST to url:

    https://nnnnnn:xxxx@cloud-eu.ocrsdk.com/v2/processImage/?exportFormat=txt

    In result I get the answer:

    {"taskId":"1cxxxxc0-cbfe-4b64-ae57-7a46f6682f1","registrationTime":"2019-12-19T16:20:06Z","statusChangeTime":"2019-12-19T16:20:06Z","status":"Queued","filesCount":1,"requestStatusDelay":10000}

    Then I use method:

    https://d69xxxx111:fvbko1Ts0O@cloud-eu.ocrsdk.com/v2/getTaskStatus/?taskId=1cxxxxc0-cbfe-4b64-ae57-7a46f6682f1

    BUT I get error:

    {"taskId":"1cxxxxc0-cbfe-4b64-ae57-7a46f6682f1","registrationTime":"2019-12-19T16:19:00Z","statusChangeTime":"2019-12-19T16:19:01Z","status":"ProcessingFailed","error":"Internal error","filesCount":1,"requestStatusDelay":0}

    ?

    0
    Comment actions Permalink
  • Avatar
    Koen de Leijer

    Due to circumstances I was not able to respond earlier.
    Keep in mind that I am not related to ABBYY and am I volunteering in helping on this forum.
    Here I have a working example (relying on https://pypi.org/project/ABBYY/ ):

    A wrapper that needs your parameters (see ....AS_PROVIDED) from ABBYY:

    from ABBYY import CloudOCR


    class ABBYYWrapper(object):

        def __init__(self, pdf_):
            self._pdf = pdf_
            self._language = 'en'
            self._exportFormat = 'pdfSearchable'
            self._cloudurl = 'ABBYY_CLOUD_URL_AS_PROVIDED'
            self._cloudapplicationid = 'ABBYY_CLOUD_ID_AS_PROVIDED'
            self._cloudpassword = 'ABBYY_CLOUD_PASSWORD_AS_PROVIDED'

        def process_and_download(self):
            """
            Performs the OCR on the PDF via the Cloud
            """

            # set file-pointer to first byte of the file
            self._pdf.seek(0)

            # create a dictionary holding  the PDF
            post_file = {'ocred_pdf': self._pdf.read()}

            # get handle to ABBYYs CloudOCR
            ocr_engine = CloudOCR(
                application_id=self._cloudapplicationid,
                password=self._cloudpassword)

            # override URL of ABBYYs CloudOCR
            ocr_engine.base_url = self._cloudurl

            # process the PDF and download/return the result
            result = ocr_engine.process_and_download(
                file=post_file,
                exportFormat=self._exportFormat,
                language=self._language)
            return result

    The wrapper can be called like:

    from io import BytesIO


    from .abbyy import ABBYYWrapper


    def perform_ocr(file_obj, settings, pdf_process_option):
        """Peforms OCR on PDF with ABBYY.

        :param file_obj: a file object open for reading.
        :return: a file object open for reading contained OCRed PDF.
        """

        ocr_engine = ABBYYWrapper(file_obj, settings, pdf_process_option)
        ocr_result = ocr_engine.process_and_download()

        if not ocr_result:
            raise Exception("No stream found by OCR engine")
        elif len(ocr_result) > 1:
            raise Exception("Multiple streams found by OCR engine")
        return [value for value in ocr_result.values()][0]


    def get_ocred_pdf(file_obj):
        with perform_ocr(file_obj) as f:
            ocr_data = f.read()
        return BytesIO(ocr_data)

    In my case it will convert a scanned PDF to a searchable PDF.

    0
    Comment actions Permalink

Please sign in to leave a comment.