I am new to Abbyy SDK. I tried going through the documentation but the documentation does not provide much information related to Java.
My task is to load a pdf with multiple pages in it. For each page do layout analysis and I want the output to mention whether a particular block is a text, table, image, signature, header and footer etc. And OCR each block. The save each image separately as text.
I am able to load pdf. extract text from each page and save the complete file as xml.
I could not find:
- how can I process each page separately? Also, whether multithreading is possible for this.
- If the text in the image is on the same line but far from each other for example:
"name: abc date:1/1/1"
abbyy gives: "name: abc date:1/1/1"
I want: "name: abc"; "date:1/1/1"
is it possible to do so?
Thanks for any kind of help.