How to reorder the recognized text

Question

Text captured by FineReader Engine is ordered incorrectly in the output results. Is it possible to reorder the recognized text?

Answer

Yes, the recognized text can be reordered.

During document analysis, FineReader Engine detects different types of blocks on the image – text, tables, pictures, etc. Blocks detected on each page are stored in the Layout object corresponding to that page and can be accessed via the ILayout::Blocks property.

However, the blocks in the LayoutBlocks collection may not be ordered in the natural reading order. This is not an issue if recognition results are exported in formats such as DOC or PDF because the layout of the original document will be restored during document synthesis. But in formats like TXT or XML, the ordering issues may be more obvious.

This is where the ILayout::SortedBlocks property can be helpful. It returns the logically ordered collection of the blocks in the layout. The LayoutBlocks collection returned by this property can be used to access the ordered blocks via the API. More information on how to do that is available in the Developer's Guide article Guided Tour > Advanced Techniques > Working with Layout and Blocks.

It is important to understand that the layout itself will not be reordered upon accessing the ILayout::SortedBlocks property. But if it is necessary to have the same reordered layout in the exported files, the ILayout::SortedBlocks property can be utilized to do that. Below is a general outline of how to do that:

C# code snippet:

private void sortLayoutBlocks(FREngine.IFRDocument document)
{
  foreach (FREngine.IFRPage page in document.Pages)
  {
    FREngine.ILayoutBlocks blocks = page.Layout.Blocks;
    // You can create a valid SortingBlocksParams object and change sorting parameters
    FREngine.SortingBlocksParams sortingBlockParams = null;
    FREngine.ILayoutBlocks sortedBlocks = page.Layout.SortedBlocks[sortingBlockParams];
    int numberOfBlocks = blocks.Count;
    int numberOfSortedBlocks = sortedBlocks.Count;
    // Append the sorted blocks to the unsorted blocks
    for (int iSortedBlock = 0; iSortedBlock < numberOfSortedBlocks; iSortedBlock++)
    {
      FREngine.IBlock sortedBlock = sortedBlocks[iSortedBlock];
      blocks.AddNew(sortedBlock.Type, sortedBlock.Region, numberOfBlocks + iSortedBlock);
      if (sortedBlock.Type == FREngine.BlockTypeEnum.BT_Table)
      {
        page.AnalyzeTable(numberOfBlocks + iSortedBlock, null, null, null);
      }
    }
    // Delete the first <numberOfBlocks> blocks, i.e. the unsorted blocks
    for (int i = 0; i < numberOfBlocks; i++)
    {
      blocks.DeleteAt(0);
    }
  }
}

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.