コミュニティ

individual word coordinates in xml

Is there a way to get individual word coordinates in xml output of processImage() ? Currently I have coordinates for lines, but I need bounding box for each word.

 

thank you!

この記事は役に立ちましたか?

0人中0人がこの記事が役に立ったと言っています

コメント

4件のコメント

  • Avatar
    Permanently deleted user

    Hi,

    Unfortunately, there isn't such feature in Export to XML. Nevertheless, you can use the following workarounds:

    • Extract Region object from each Word in Paragraph::Words object. The Region object stores coordinates of its area. The Paragraph itself can be obtained from Page::Layout::LayoutBlocks::Block for each type of Block separately.
    • Calculate the work coordinates from coordinates of its characters. They may be obtained in XML output after setting XMLExportParams::WriteCharAttributes = XCA_Basic.
    1
  • Avatar
    Csaba Hajnal

    Tigran, try out ALTO XML export, it contains word-level information

    0
  • Avatar
    Csaba Hajnal

    I don't understand you exactly. Paragraph::Words:Word hasn't got any Region data. (I use FineReader Engine SDK v11)

    0
  • Avatar
    Permanently deleted user

    Hi Tigran,


    If in any case you are working with Python, you can with PDFMiner;

    Python 3:
    https://github.com/pdfminer/pdfminer.six

    Python 2:
    https://pypi.org/project/pdfminer/

    It seems that Apache PDFBox (I have not tried that part of PDFBox) is also capable of doing so
    https://stackoverflow.com/questions/33427686/getting-bounding-boxes-of-text-lines-from-a-pdf-using-pdfbox

    We use PDFMiner and PDFBox next to ABBYY FineReader

    Best regards
    Koen de Leijer

     

     

    0

サインインしてコメントを残してください。