Text is ignored half the time – Help Center

version: 11.1.19.72

I run this code using the EngineLoader used in the sample code:

engineLoader.Engine.LoadPredefinedProfile("DocumentConversion_Accuracy");

//create document
FR.FRDocument document = engineLoader.Engine.CreateFRDocument();

//get and add screenshot
System.Drawing.Image screenShot = this.GetScreenShot();

using (MemoryStream m = new MemoryStream())
{
	screenShot.Save(m, System.Drawing.Imaging.ImageFormat.Png);
	m.Position = 0;
	document.AddImageFileFromStream(new ABBYReadStream(m));
}

//process and synthesize
document.Process();
document.Synthesize();

//find the text
int posX = 0;
int posY = 0;
for (int x = 0; x < document.Pages.Count; x++)
{
	FR.LayoutBlocks blocks = document.Pages[x].Layout.Blocks;
	for (int y = 0; y < blocks.Count; y++)
	{
		FR.IBlock block = blocks[y];
		if (block.Type == FR.BlockTypeEnum.BT_Text)
		{
			FR.TextBlock textBlock = block.GetAsTextBlock();
			for (int z = 0; z < textBlock.Text.Paragraphs.Count; z++)
			{
				//need to use the options & regex in UIAutomationHelper
				FR.Paragraph paragraph = textBlock.Text.Paragraphs[z];
				if (paragraph.Text != text)
					continue;

				//find middle point of text
				posX = paragraph.Left + (paragraph.Right - paragraph.Left) / 2;
				posY = paragraph.Top + (paragraph.Bottom - paragraph.Top) / 2;
			}
		}
	}
}

The image I add to the document is the screenshot provided, but half the time the parts circled in red are not found in the text after document synthesis takes place. I have also tried using the "Default" engine profile.

Any ideas why these parts of the image are sometimes ignored?

Edit: Also I have verified that the whole image is added to the document by outputting the document afterwards as a pdf, so its not cutting off part of the image.

2件のコメント

Permanently deleted user

2017年10月07日 17:20
It looks like the same question I've asked recently;

Have you already checked the answer here: https://forum.ocrsdk.com/thread/some-parts-of-a-specific-pdf-are-not-ocr-ed-by-abbyy-finereader-engine/

0
Permanently deleted user

2017年10月10日 14:42
I had not seen that answer, thank you for pointing it out. Switching to a profile where the parameters mentioned in that thread are set to true (in my case DocumentArchiving_Accuracy) solved the issue.

0

サインインしてコメントを残してください。

コミュニティ

Text is ignored half the time 回答済み

この記事は役に立ちましたか？

コメント

お探しのものを見つけられませんでしたか？