How can table structure be restored based on the XML output?
Output XML files for documents that contain tables include block elements with the blockType attribute set to Table. A block element of type Table contains child row elements that represent table rows, and a row element, in turn, contains cell elements that represent table cells. A block element also includes the l, t, r, and b attributes that correspond, respectively, to the left, top, right, and bottom coordinates of the whole block. These attributes, l and t in particular, will be necessary to restore the table structure based on the information contained in the cell elements.
Basically, the table should be parsed row by row, cell by cell. Each cell element includes the width and height attributes. Therefore, for the first cell in the first row its left, top, right, and bottom coordinates would be:
- left – l from the block element;
- top – t from the block element;
- right – left + width from the cell element;
- bottom – left + height from the cell element.
For the second cell in a row, the left coordinate will be equal to the right coordinate of the first cell. For cells in the second row, the top coordinate will be equal to the bottom coordinate of the first row.