I was trying out the sample code for the Python implementation of ABBYY OCR SDK (https://github.com/abbyysdk/ocrsdk.com/tree/master/Python). Specifically, I was trying to make use of the processReceipt method, to extract fields from a receipt and get an XML file I can parse later on.
In AbbyyOnlineSdk.py, I changed the ServerUrl (line 38) to
"https://cloud-westus.ocrsdk.com/v2/processReceipts" (as per this guide:https://www.ocrsdk.com/documentation/api-reference/process-receipt-method-v2/) then I uploaded a sample receipt file (JPG file), but got the following error:
File "process.py", line 111, in <module>
File "process.py", line 105, in main
recognize_file(source_file, target_file, language, output_format)
File "process.py", line 39, in recognize_file
task = processor.process_image(file_path, settings)
File "C:\Users\domer\Desktop\receipt_separation\ABBY OCR SDK\AbbyyOnlineSdk.py", line 65, in process_image
task = self.decode_response(response.text)
File "C:\Users\domer\Desktop\receipt_separation\ABBY OCR SDK\AbbyyOnlineSdk.py", line 94, in decode_response
dom = xml.dom.minidom.parseString(xml_response)
File "C:\Users\domer\AppData\Local\Programs\Python\Python38-32\lib\xml\dom\minidom.py", line 1969, in parseString
File "C:\Users\domer\AppData\Local\Programs\Python\Python38-32\lib\xml\dom\expatbuilder.py", line 925, in parseString
File "C:\Users\domer\AppData\Local\Programs\Python\Python38-32\lib\xml\dom\expatbuilder.py", line 223, in parseString
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 22, column 59
I've tried "https://cloud-westus.ocrsdk.com" previous to the one above, and was able to successfully extract the text.
py process.py "receipt.jpg" results.txt
Id = abe4beb8-5a05-48a9-9f13-4dab3f0abe05
Status = Queued
Status = Completed
Result was written to results.txt
Any ideas how I can best approach this problem?