Is there a way to get individual word coordinates in xml output of processImage() ? Currently I have coordinates for lines, but I need bounding box for each word.
thank you!
Is there a way to get individual word coordinates in xml output of processImage() ? Currently I have coordinates for lines, but I need bounding box for each word.
thank you!
0 out of 0 found this helpful
Comments
4 comments
Hi,
Unfortunately, there isn't such feature in Export to XML. Nevertheless, you can use the following workarounds:
Tigran, try out ALTO XML export, it contains word-level information
I don't understand you exactly. Paragraph::Words:Word hasn't got any Region data. (I use FineReader Engine SDK v11)
Hi Tigran,
If in any case you are working with Python, you can with PDFMiner;
Python 3:
https://github.com/pdfminer/pdfminer.six
Python 2:
https://pypi.org/project/pdfminer/
It seems that Apache PDFBox (I have not tried that part of PDFBox) is also capable of doing so
https://stackoverflow.com/questions/33427686/getting-bounding-boxes-of-text-lines-from-a-pdf-using-pdfbox
We use PDFMiner and PDFBox next to ABBYY FineReader
Best regards
Koen de Leijer
Please sign in to leave a comment.