コミュニティ

Problem with text-fields in PDF

 Hi!

We uploaded a pdf (7ed300c1-a19e-47e0-83d2-e97d26f9474f) which contains text-fields and images with text.
The image recognition is working pretty good.

But the extraction of the plain-textfields is not working every time.

E.G.
Original:
1
Allgemeinuntersuchung Pferd  [20 a]
19,00
38,48

<par>
<line baseline="1556" l="458" t="1535" r="1149" b="1561"><formatting lang="GermanStandard">
<charParams l="458" t="1537" r="479" b="1555">A</charParams>
<charParams l="481" t="1536" r="495" b="1555">l</charParams>
<charParams l="502" t="1536" r="514" b="1555">l</charParams>
<charParams l="520" t="1541" r="536" b="1561">g</charParams>
<charParams l="541" t="1541" r="556" b="1555">e</charParams>
<charParams l="560" t="1541" r="579" b="1555">m</charParams>
<charParams l="581" t="1541" r="596" b="1555">e</charParams>
<charParams l="602" t="1535" r="614" b="1555">i</charParams>
<charParams l="621" t="1541" r="637" b="1555">n</charParams>
<charParams l="640" t="1541" r="657" b="1555">u</charParams>
<charParams l="660" t="1541" r="677" b="1555">n</charParams>
<charParams l="680" t="1536" r="696" b="1555">t</charParams>
<charParams l="701" t="1541" r="716" b="1555">e</charParams>
<charParams l="721" t="1541" r="737" b="1555">r</charParams>
<charParams l="741" t="1541" r="755" b="1555">s</charParams>
<charParams l="760" t="1541" r="777" b="1555">u</charParams>
<charParams l="781" t="1541" r="797" b="1555">c</charParams>
<charParams l="800" t="1535" r="817" b="1555">h</charParams>
<charParams l="820" t="1541" r="837" b="1555">u</charParams>
<charParams l="840" t="1541" r="856" b="1555">n</charParams>
<charParams l="860" t="1541" r="876" b="1561">g</charParams>
<charParams l="877" t="1536" r="902" b="1561"> </charParams>
<charParams l="903" t="1536" r="917" b="1555">P</charParams>
<charParams l="923" t="1535" r="936" b="1555">f</charParams>
<charParams l="941" t="1541" r="956" b="1555">e</charParams>
<charParams l="962" t="1541" r="976" b="1555">r</charParams>
<charParams l="979" t="1535" r="996" b="1555">d</charParams>
<charParams l="997" t="1535" r="1046" b="1559"> </charParams>
<charParams l="1047" t="1535" r="1052" b="1559">[</charParams>
<charParams l="1061" t="1535" r="1075" b="1555" suspicious="1">2</charParams>
<charParams l="1082" t="1535" r="1096" b="1555">0</charParams>
<charParams l="1097" t="1535" r="1120" b="1555"> </charParams>
<charParams l="1121" t="1542" r="1136" b="1555">a</charParams>
<charParams l="1143" t="1536" r="1149" b="1559">]</charParams></formatting></line></par>

and

<text>
<par align="Justified">
<line baseline="1555" l="322" t="1535" r="334" b="1555"><formatting lang="GermanStandard">
<charParams l="322" t="1535" r="334" b="1555" suspicious="1">2</charParams></formatting></line></par>
</text>

 

API identifies the number one as a character "2" and adds the tag suspicious.
Can you please check it?

Best wishes

Marc

この記事は役に立ちましたか?

0人中0人がこの記事が役に立ったと言っています

コメント

1件のコメント

  • Avatar
    Permanently deleted user

    Hi Marc,

    Kindly try to exclude the Latin language from the set of recognition languages.

    If the issue is still actual, please provide this PDF document that is being processed incorrectly to Cloudocrsdk@abbyy.com so we could test and inestigate the issue on our side. Thank you beforhand!

    0

サインインしてコメントを残してください。