Question
Text captured by FineReader Engine is ordered incorrectly in the output results. Is it possible to reorder the recognized text?
Answer
Yes, the recognized text can be reordered.
During document analysis, FineReader Engine detects different types of blocks on the image – text, tables, pictures, etc. Blocks detected on each page are stored in the Layout object corresponding to that page and can be accessed via the ILayout::Blocks property.
However, the blocks in the LayoutBlocks collection may not be ordered in the natural reading order. This is not an issue if recognition results are exported in formats such as DOC or PDF because the layout of the original document will be restored during document synthesis. But in formats like TXT or XML, the ordering issues may be more obvious.
This is where the ILayout::SortedBlocks property can be helpful. It returns the logically ordered collection of the blocks in the layout. The LayoutBlocks collection returned by this property can be used to access the ordered blocks via the API. More information on how to do that is available in the Developer's Guide article Guided Tour > Advanced Techniques > Working with Layout and Blocks.
It is important to understand that the layout itself will not be reordered upon accessing the ILayout::SortedBlocks property. But if it is necessary to have the same reordered layout in the exported files, the ILayout::SortedBlocks property can be utilized to do that. Below is a general outline of how to do that:
- Perform document preprocessing and analysis (IFRDocument::Preprocess and IFRDocument::Analyze).
- For each page in the document:
- Obtain the layout via the IFRPage::Layout property.
- Obtain the sorted blocks via the ILayout::SortedBlocks property.
- Append the sorted blocks one by one at the end of the ILayout::Blocks collection (ILayoutBlocks::AddNew) using the IBlock::Type and IBlock::Region properties. If the appended block corresponds to a table (IBlock::Type = BT_Table), the IFRPage::AnalyzeTable method should be called for that block.
- Remove the unsorted old blocks from the begging of the ILayout::Blocks collection.
- Perform document recognition and synthesis (IFRDocument::Recognize and IFRDocument::Synthesize).
C# code snippet:
private void sortLayoutBlocks(FREngine.IFRDocument document) { foreach (FREngine.IFRPage page in document.Pages) { FREngine.ILayoutBlocks blocks = page.Layout.Blocks; // You can create a valid SortingBlocksParams object and change sorting parameters FREngine.SortingBlocksParams sortingBlockParams = null; FREngine.ILayoutBlocks sortedBlocks = page.Layout.SortedBlocks[sortingBlockParams]; int numberOfBlocks = blocks.Count; int numberOfSortedBlocks = sortedBlocks.Count; // Append the sorted blocks to the unsorted blocks for (int iSortedBlock = 0; iSortedBlock < numberOfSortedBlocks; iSortedBlock++) { FREngine.IBlock sortedBlock = sortedBlocks[iSortedBlock]; blocks.AddNew(sortedBlock.Type, sortedBlock.Region, numberOfBlocks + iSortedBlock); if (sortedBlock.Type == FREngine.BlockTypeEnum.BT_Table) { page.AnalyzeTable(numberOfBlocks + iSortedBlock, null, null, null); } } // Delete the first <numberOfBlocks> blocks, i.e. the unsorted blocks for (int i = 0; i < numberOfBlocks; i++) { blocks.DeleteAt(0); } } }
Comments
0 comments
Please sign in to leave a comment.