version: 11.1.19.72
I run this code using the EngineLoader used in the sample code:
engineLoader.Engine.LoadPredefinedProfile("DocumentConversion_Accuracy");
//create document
FR.FRDocument document = engineLoader.Engine.CreateFRDocument();
//get and add screenshot
System.Drawing.Image screenShot = this.GetScreenShot();
using (MemoryStream m = new MemoryStream())
{
screenShot.Save(m, System.Drawing.Imaging.ImageFormat.Png);
m.Position = 0;
document.AddImageFileFromStream(new ABBYReadStream(m));
}
//process and synthesize
document.Process();
document.Synthesize();
//find the text
int posX = 0;
int posY = 0;
for (int x = 0; x < document.Pages.Count; x++)
{
FR.LayoutBlocks blocks = document.Pages[x].Layout.Blocks;
for (int y = 0; y < blocks.Count; y++)
{
FR.IBlock block = blocks[y];
if (block.Type == FR.BlockTypeEnum.BT_Text)
{
FR.TextBlock textBlock = block.GetAsTextBlock();
for (int z = 0; z < textBlock.Text.Paragraphs.Count; z++)
{
//need to use the options & regex in UIAutomationHelper
FR.Paragraph paragraph = textBlock.Text.Paragraphs[z];
if (paragraph.Text != text)
continue;
//find middle point of text
posX = paragraph.Left + (paragraph.Right - paragraph.Left) / 2;
posY = paragraph.Top + (paragraph.Bottom - paragraph.Top) / 2;
}
}
}
}
The image I add to the document is the screenshot provided, but half the time the parts circled in red are not found in the text after document synthesis takes place. I have also tried using the "Default" engine profile.
Any ideas why these parts of the image are sometimes ignored?
Edit: Also I have verified that the whole image is added to the document by outputting the document afterwards as a pdf, so its not cutting off part of the image.
コメント
2件のコメント
It looks like the same question I've asked recently;
Have you already checked the answer here: https://forum.ocrsdk.com/thread/some-parts-of-a-specific-pdf-are-not-ocr-ed-by-abbyy-finereader-engine/
I had not seen that answer, thank you for pointing it out. Switching to a profile where the parameters mentioned in that thread are set to true (in my case DocumentArchiving_Accuracy) solved the issue.
サインインしてコメントを残してください。