Hi, i need to represent the text from recognized image in a specific manner: Lines which stores Words. Each word stores own rectangle coordinates. I'm using a method ported from previous version of FRE. Could you please give me a hint how can i improve it or achieve similar fucntionality because it takes 20seconds to process the results when image recognition completes in 5 seconds.
All FRE procedures runs in a single STA thread.
private DocPage ExtractData(FREngine.FRPage page)
{
var docPage = new DocPage();
var layout = page.Layout;
var cp = engine.CreateCharParams();
var cp2 = engine.CreateCharParams();
for (var blocksCounter = 0; blocksCounter < layout.Blocks.Count; blocksCounter++)
{
var currentBlock = layout.Blocks[blocksCounter];
var textblock = currentBlock.GetAsTextBlock();
if (textblock != null)
for (var paragraphCounter = 0;
paragraphCounter < textblock.Text.Paragraphs.Count; paragraphCounter++)
{
var currentParagraph = textblock.Text.Paragraphs[paragraphCounter];
var linesFirstChars = new int[currentParagraph.Lines.Count];
var wordsFirstChars = new int[currentParagraph.Words.Count];
for (int linesCounter = 0; linesCounter < currentParagraph.Lines.Count; linesCounter++)
linesFirstChars[linesCounter] = currentParagraph.Lines[linesCounter].FirstCharIndex;
for (int wordsCounter = 0; wordsCounter < currentParagraph.Words.Count; wordsCounter++)
wordsFirstChars[wordsCounter] = currentParagraph.Words[wordsCounter].FirstSymbolPosition;
DocLine currentLine = null;
DocWord currentWord = null;
for (int linesCounter = 0, wordsCounter = 0, charCounter = 0;
charCounter < currentParagraph.Text.Length; charCounter++)
{
if (linesFirstChars.Length > linesCounter &&
charCounter >= linesFirstChars[linesCounter])
{
if (currentLine != null)
{
var rec = currentLine.Words[0].Rectangle;
var left = rec.Left;
var top = rec.Top;
var right = rec.Bottom;
var bottom = currentLine.Words[currentLine.Words.Count - 1].Rectangle.Bottom;
currentLine.Rectangle = new System.Drawing.Rectangle(left, top, right - left, bottom - top);
}
linesCounter++;
docPage.Lines.Add(new DocLine());
currentLine = docPage.Lines.Last();
}
if (wordsFirstChars.Length > wordsCounter &&
charCounter >= wordsFirstChars[wordsCounter])
{
currentLine.Words.Add(new DocWord());
currentWord = currentLine.Words.Last();
currentWord.Text = currentParagraph.Words[wordsCounter].Text;
currentParagraph.GetCharParams(charCounter, cp);
int len = charCounter + currentWord.Text.Length - 1;
if (currentParagraph.Length < len) len = currentParagraph.Length - 1;
currentParagraph.GetCharParams(len, cp2);
currentWord.Rectangle = new System.Drawing.Rectangle(cp.Left, cp.Top, cp2.Right - cp.Left, cp.Bottom - cp.Top);
wordsCounter++;
}
}
currentWord = null;
currentLine = null;
currentParagraph = null;
}
currentBlock = null;
}
return docPage;
}
Comments
1 comment
We have processed our standard Demo.tif with all methods you use. Process slowdown was not reproduced on our side — results processing takes 2 seconds.
Generally, the performance depends on image structure and quality, settings and machine configuration. You can influence performance by selecting appropriate settings. To increase processing speed please refer to the Developer’s Help → Guided Tour → Best Practices → Increasing Processing Speed and to the following article of our Knowledge Base: http://knowledgebase.ocrsdk.com/article/1222.
If these recommendations will not helpful, please send the simple sample project and an AInfo report to SDK_support@abbyy.com in order we could have a better look at the issue and give you appropriate recommendations.
Please sign in to leave a comment.