Community

how to detect start - end of a table ? (pdf to .txt) python

how to detect start - end  of a table and add tags to .txt  for identify when is a table.

example i need this : 

 

-#tag(start)

PROCESS TIME TEMP IN DEG F COOLING METHOD

HARDEN 6 HOURS 1575 QUENCHED TO QUENCH TEMPERATURES

POLYMER QUENCHEDA: 101-106 B: 104-110 C: 97-103

TEMPER 9 HOURS A-B: 1056 C: 1066 WATER QUENCHED

-#tag(end)

TENSILEYIELD .2% OFF

YIELD .6% EUL

ELONG IN 2" REDUCTION

148,200

 

or is posible add ' | ' for to delimit cell in the .txt

0

Comments

3 comments

  • Avatar
    Jeen Dzul

    ????

    0
    Comment actions Permalink
  • Avatar
    Vishnu Vardhan

    Hi jeen Dzul,

          As far as i know, I don't think abbyy retains table start and end information in .txt. You can try out xml output from abbyy using documentConversion profile(retains document structure) instead of textExtraction. Refer here more on profiles.

    In the xml response you can look for block tag with attribute blockType='Table' like below,

    <block blockType="Table">

       <row>

          <cell></cell>

          <cell></cell>

       </row>

    </block>

    Refer here to know more on xml tags.

     

    Thanks,

    Vishnu

    1
    Comment actions Permalink
  • Avatar
    Jeen Dzul

    but is posible is my file to scan is a pdf. ???

    0
    Comment actions Permalink

Please sign in to leave a comment.