Hi!
We uploaded a pdf (7ed300c1-a19e-47e0-83d2-e97d26f9474f) which contains text-fields and images with text.
The image recognition is working pretty good.
But the extraction of the plain-textfields is not working every time.
E.G.
Original:
1
Allgemeinuntersuchung Pferd [20 a]
19,00
38,48
<par>
<line baseline="1556" l="458" t="1535" r="1149" b="1561"><formatting lang="GermanStandard">
<charParams l="458" t="1537" r="479" b="1555">A</charParams>
<charParams l="481" t="1536" r="495" b="1555">l</charParams>
<charParams l="502" t="1536" r="514" b="1555">l</charParams>
<charParams l="520" t="1541" r="536" b="1561">g</charParams>
<charParams l="541" t="1541" r="556" b="1555">e</charParams>
<charParams l="560" t="1541" r="579" b="1555">m</charParams>
<charParams l="581" t="1541" r="596" b="1555">e</charParams>
<charParams l="602" t="1535" r="614" b="1555">i</charParams>
<charParams l="621" t="1541" r="637" b="1555">n</charParams>
<charParams l="640" t="1541" r="657" b="1555">u</charParams>
<charParams l="660" t="1541" r="677" b="1555">n</charParams>
<charParams l="680" t="1536" r="696" b="1555">t</charParams>
<charParams l="701" t="1541" r="716" b="1555">e</charParams>
<charParams l="721" t="1541" r="737" b="1555">r</charParams>
<charParams l="741" t="1541" r="755" b="1555">s</charParams>
<charParams l="760" t="1541" r="777" b="1555">u</charParams>
<charParams l="781" t="1541" r="797" b="1555">c</charParams>
<charParams l="800" t="1535" r="817" b="1555">h</charParams>
<charParams l="820" t="1541" r="837" b="1555">u</charParams>
<charParams l="840" t="1541" r="856" b="1555">n</charParams>
<charParams l="860" t="1541" r="876" b="1561">g</charParams>
<charParams l="877" t="1536" r="902" b="1561"> </charParams>
<charParams l="903" t="1536" r="917" b="1555">P</charParams>
<charParams l="923" t="1535" r="936" b="1555">f</charParams>
<charParams l="941" t="1541" r="956" b="1555">e</charParams>
<charParams l="962" t="1541" r="976" b="1555">r</charParams>
<charParams l="979" t="1535" r="996" b="1555">d</charParams>
<charParams l="997" t="1535" r="1046" b="1559"> </charParams>
<charParams l="1047" t="1535" r="1052" b="1559">[</charParams>
<charParams l="1061" t="1535" r="1075" b="1555" suspicious="1">2</charParams>
<charParams l="1082" t="1535" r="1096" b="1555">0</charParams>
<charParams l="1097" t="1535" r="1120" b="1555"> </charParams>
<charParams l="1121" t="1542" r="1136" b="1555">a</charParams>
<charParams l="1143" t="1536" r="1149" b="1559">]</charParams></formatting></line></par>
and
<text>
<par align="Justified">
<line baseline="1555" l="322" t="1535" r="334" b="1555"><formatting lang="GermanStandard">
<charParams l="322" t="1535" r="334" b="1555" suspicious="1">2</charParams></formatting></line></par>
</text>
API identifies the number one as a character "2" and adds the tag suspicious.
Can you please check it?
Best wishes
Marc
コメント
1件のコメント
Hi Marc,
Kindly try to exclude the Latin language from the set of recognition languages.
If the issue is still actual, please provide this PDF document that is being processed incorrectly to Cloudocrsdk@abbyy.com so we could test and inestigate the issue on our side. Thank you beforhand!
サインインしてコメントを残してください。