Question
How to improve table recognition quality?
Answer
In some cases, it is possible to get a corrupted layout, because the tables in the document were not detected.
- Make sure that images come in sufficient quality. Recommended is 300 DPI, color or grayscale images.
- If documents have good quality, then make sure that any of the parameters below are not used because they turn off table detection:
- IPageAnalysisParams.EnableTextExtractionMode = true;
- IPageAnalysisParams.DetectTables = false;
- FREngine.LoadPredefinedProfile("TextExtraction_Accuracy");
- It is possible to use the following parameter to make table detection a priority for the Analyser:
- IPageAnalysisParams.AggressiveTableDetection = true;
- In rare cases, FineReader Engine cannot detect tables even if forced. For example, this happens if the table has a lot of decorative formatting, does not have clear separators or decorate fonts are not detected clearly.
There is one last method of table recognition, applicable only to the pages, which consist of the table alone (no pictures or text blocks outside the table). It is possible to create a table block covering the whole page area and forcefully analyze that block. C# code sample:
FREngine.IRegion wholePageRegion = engineLoader.Engine.CreateRegion();
wholePageRegion.AddRect(0, 0, document.Pages[0].ImageDocument.BlackWhiteImage.Width, document.Pages[0].ImageDocument.BlackWhiteImage.Height);
FREngine.IBlock block = document.Pages[0].Layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Table, wholePageRegion);
FREngine.ITableBlock tableBlock = block.GetAsTableBlock();
document.Pages[0].AnalyzeTable(0);
document.Recognize();
document.Synthesize();
Comments
0 comments
Please sign in to leave a comment.