Community

Why is OCR SDK processImage returning bad results for receipt?

Hi,

I'm trying to process a receipt and am getting very poor results on a particular image. I'm calling processImage with exportFormat set to txt, correctSkew set to false, and imageSource set to scanner. Below is the image I'm processing

alt text

and the results being returned are

alt text

As you can see, a lot of the item descriptions are missing, some of the amounts don't have values for cents, there are extra spaces in the return, among other issues. What can I do to get better results?

0

Comments

11 comments

  • Avatar
    Oksana Serdyuk

    Hi,

    Please try the textExtraction profile for your scenario. This profile is suitable for extracting all text from the input image.

    Note that the red oval hinders ABBYY Cloud OCR SDK to recognize accurately the text above and below the line: Age Confirmed - 12/12/1912. This is expected behavior of the program.

    0
    Comment actions Permalink
  • Avatar
    ppunzalan

    Hi Oksana,

    Please see my answer below, as I cannot include images when commenting on your answer (limitation of the forum).

    Thanks.

    0
    Comment actions Permalink
  • Avatar
    ppunzalan

    Hi Oksana,

    I tried adding profile=textExtraction and this particular receipt is getting better results. Here is what was returned:

    alt text

    However, other receipts are getting bad results with profile=textExtraction. For example, when I submit this image

    alt text

    I was getting these results (without profile=textExtraction)

    alt text

    but now I'm getting these results (with profile=textExtraction)

    alt text

    As you can see, I'm loosing the Subtotal, line item amounts (and those that are read are still incorrectly read), Total amount, etc. Is there one call to read a receipt that will work on all receipts?

    0
    Comment actions Permalink
  • Avatar
    Oksana Serdyuk

    We have tested your images and sent our results and recommendations to you by e-mail.

    0
    Comment actions Permalink
  • Avatar
    ppunzalan

    Hi Oksana,

    As suggested in your email response (that I've attached below), I have already tried setting the profile=textExtraction with mixed results. You also state "try to find more optimal recognition settings for your kind of images", but that's what I'm asking your advise on. What would those settings be?

    You also suggest using a better image quality, but I'm trying to process receipts that clients will be taking photos of with their mobile phones and then emailing to a server for processing. I believe your ABBYY FineReader 12 is a desktop application, which isn't an option since all processing is online. Is there a perimeter that can be passed to ABBYY Cloud OCR SDK making the SDK increase the image quality?

    Is there any other suggestions you might have to make ABBYY Cloud OCR SDK work for me?

    Thanks.


    Hi Pamela,

    Thank you for your interest in our product.

    We are writing to you regarding your question at ABBYY Cloud OCR SDK forum. To achieve better recognition results we could advise you to take care of the source images quality and try to find more optimal recognition settings for your kind of images. Below you can find our recommendations which you can use as a starting point.

    At first, it is necessary to notice that your images have quite low resolution for recognition. Mind that the image resolution has a real impact on the OCR quality that can be achieved. We have changed resolution of your image to more optimal values using ABBYY FineReader 12: Image Editor -> the Resolution tool. Please review the OCR - Optimal Image Resolution article to know more about the recommended resolution values for OCR purposes.

    Also as we have already written at our forum, it is usually recommended to use the textExtraction profile for your usage scenario. This profile is better to use for receipts processing as it provides better results both in recognition quality and in speed of processing. Morever it is suitable for extracting all text from the input image, including small text areas of low.

    We have tested your images and managed to achieve quite good recognition results using our above recommendations. Please find our results in the attachment:

    Folder Images consists of your original image and our images after FineReader 12 image preprocessing; Folder Results consists of two subfolders: textExtraction and documentConversion. They have our OCR results which we have got using the processImage method with corresponding profiles.

    Hope the information is useful.

    If you have any technical issues, please visit our Developer Forum to get fast help from ABBYY Cloud OCR SDK developers’ community. Follow us on Twitter to get the latest news.

    Kind regards, Oksana Serdyuk Technical Support Engineer

    0
    Comment actions Permalink
  • Avatar
    Oksana Serdyuk

    We have ABBYY Mobile Imaging SDK that you can use for image preprocessing on the mobile devices.

    0
    Comment actions Permalink
  • Avatar
    ppunzalan

    That will not work for us since we need to streamline the process for our end users. Do you have plans to support imaging in the API in the future? Is there someone I can contact directly at ABBYY to speak to about this issue?

    0
    Comment actions Permalink
  • Avatar
    Oksana Serdyuk

    So far there are no plans to support the image preprocessing in ABBYY Cloud OCR SDK. Anyway, I've forwarded your contact info to my colleagues from our office located in your region. They will contact you soon to discuss the issue.

    0
    Comment actions Permalink
  • Avatar
    rainerp

    Hi Pamela,

    in BETA we have a method that extracts the data from receipts and returns it in an XML structure.

    Cheers,

    Rainer

    0
    Comment actions Permalink
  • Avatar
    ppunzalan

    Thanks Rainer,

    I was recently told about this module and after performing some testing, I've found it does a much better job than the processImage option.

    0
    Comment actions Permalink
  • Avatar
    rainerp

    The method for receipt capture is now offically released for the USA. For other countries it is still in beta. Please see more information here: http://ocrsdk.com/documentation/apireference/processReceipt/

    and here

    https://www.abbyy.com/receipt-capture-ocr/

    0
    Comment actions Permalink

Please sign in to leave a comment.