Been trying the OCR API and have implementation questions. In the attached file, I've highlighted in purple the 3 areas I need to focus on. In the bottom right you'll see 2 purple rectangles, one has SHEET NUMBER and the other is DRAWING TITLE. 1. I need to capture the value just below SHEET NUMBER (in the attached it corresponds to A5.0). 2. I need to capture the value just below the DRAWING TITLE (in the attached it corresponds to ELEVATIONS)
The last thing I need is the hyperlink functionality. In the attached it is a purple circle. Within the circle, there is a top and bottom. The top indicated the link number (in the attached it corresponds to 5), so if there are 10 links in one page there will be 10 circles and each will have a corresponding number 1 thru 10. The bottom value is the page to hyperlink to. 3. I need to capture the page value to hyperlink (in the attached it corresponds to A6.5). I also need to coordinates within that page of the shape (circle) or the text (A6.5) so that I can create a hyperlink programmatically to that page at those coordinates.
I've looked thru your documentation and its not clear about the precise steps I'd take to perform these 3 tasks. Do I need to call processDocument/processImage? Then do I need to processTextField/processFields? Can these be combined? The file is in the attached link. (http://forum.ocrsdk.com/upfiles/design1.png) Thanks!
Comments
13 comments
Please check the image link, I get the 404 error (file not found). You can send your image to CloudOCRSDK@abbyy.com.
I added the file again, looks like it didn't work the first time.
If we speak about ABBYY Cloud OCR SDK, you can try to use the full-page recognition (the processImage or processDocument methods) with the textExtraction profile and export recognized text with its coordinates to the XML format. Then you will be able to extract all necessary data by parsing the output file on your side.
Alternatively, have a look at ABBYY FlexiCapture Engine, it is a non-cloud based data capture SDK designed to process the structured documents.
Thanks for the feedback. I'm able to get the Sheet Number and Category using processDocument. However, I'm not able to get anything related to the page links I described above. So I've simplified the matter for the page links and use a simple PDF and PNG of just one small set of markers.
I've tried processImage, processDocument and processTextField and don't get anything useful. It seems like a fairly simple thing to get the 4 chars from either the PDF or PNG (the 4 instances of the 4 chars = A6.5). I've attached the xml results from each of the process calls. Note that for the processTextField, I originally got nothing using the default textType so I changed that to
textSettings.setTextType("normal,typewriter,matrix,index,handprinted,ocrA,ocrB");
which produced better results (in the xml files) but still couldn't recognize even one instance of A6.5 correctly. Is there some other setting for any of the process methods to get those 4 simple chars (A6.5) from the PDF/PNG? Seems like it should be something fairly simple but I get nothing no matter what I try.

Please sorry for delay in response.
Could you please specify if you use the textExtraction profile as I recommended in the previous answer? According to my tests, these part of page links can be recognized using this profile. This profile is suitable for extracting all text from the input image, including small text areas. Please note that in this case the document appearance and structure are ignored, pictures and tables are not detected. The result will contain only text.
You can also try to use the field-level recognition for your scenario. The processTextField method has the letterSet parameter. By using this parameter you can specify the letter set which should be used during recognition. I have managed to recognize the needed part of your image using the processTextField method and the following parameters:
Thanks for the reply. I added the letterSet and was able to get all 4 sets of A6.5 (yahoo!). The problem is that when I tried to go back and run that against the original pdf file (saved as a PNG file... same as I did for the small subset of images with JUST the 4 A6.5 annotations) I don't get any of the A6.5 values within the full image. I've attached the original version in full size. Can you try and see if your run of the attached is able to locate the AX.Y values in the full image. Thanks for your help, I think we're close to getting it working.
Please re-send the source image to CloudOCRSDK@abbyy.com. When we download the image from the site, it is modified (for example, its resolution is not original).
Sent both the PDF and PNG exported version.
I am still testing your images but unfortunately can't get a good result where all needed values are accurately recognized.
As the document includes the technical drawings with the text of small font sizes (8 points or smaller) we recommend to use 400-600 dpi resolution or even more for getting better recognition results. Is it possible for you to scan your documents of form your PDF files with recommended resolution?
I was able to save it in 300DPI (that's as high a resolution as I can save it). I sent it to CloudOCRSDK@abbyy.com
Thank you! Now it is much better and we can extract all text that we need from the input image. I've sent my results to you by e-mail.
Thanks, I see the results are better. However, I can't parse the xml file because of validation errors (http://forum.ocrsdk.com/questions/4775/xml-parse-error-of-processimage-results-file).
Please sign in to leave a comment.