Community

Problem with text-fields in PDF

 Hi!

We uploaded a pdf (7ed300c1-a19e-47e0-83d2-e97d26f9474f) which contains text-fields and images with text.
The image recognition is working pretty good.

But the extraction of the plain-textfields is not working every time.

E.G.
Original:
1
Allgemeinuntersuchung Pferd  [20 a]
19,00
38,48

<par>
<line baseline="1556" l="458" t="1535" r="1149" b="1561"><formatting lang="GermanStandard">
<charParams l="458" t="1537" r="479" b="1555">A</charParams>
<charParams l="481" t="1536" r="495" b="1555">l</charParams>
<charParams l="502" t="1536" r="514" b="1555">l</charParams>
<charParams l="520" t="1541" r="536" b="1561">g</charParams>
<charParams l="541" t="1541" r="556" b="1555">e</charParams>
<charParams l="560" t="1541" r="579" b="1555">m</charParams>
<charParams l="581" t="1541" r="596" b="1555">e</charParams>
<charParams l="602" t="1535" r="614" b="1555">i</charParams>
<charParams l="621" t="1541" r="637" b="1555">n</charParams>
<charParams l="640" t="1541" r="657" b="1555">u</charParams>
<charParams l="660" t="1541" r="677" b="1555">n</charParams>
<charParams l="680" t="1536" r="696" b="1555">t</charParams>
<charParams l="701" t="1541" r="716" b="1555">e</charParams>
<charParams l="721" t="1541" r="737" b="1555">r</charParams>
<charParams l="741" t="1541" r="755" b="1555">s</charParams>
<charParams l="760" t="1541" r="777" b="1555">u</charParams>
<charParams l="781" t="1541" r="797" b="1555">c</charParams>
<charParams l="800" t="1535" r="817" b="1555">h</charParams>
<charParams l="820" t="1541" r="837" b="1555">u</charParams>
<charParams l="840" t="1541" r="856" b="1555">n</charParams>
<charParams l="860" t="1541" r="876" b="1561">g</charParams>
<charParams l="877" t="1536" r="902" b="1561"> </charParams>
<charParams l="903" t="1536" r="917" b="1555">P</charParams>
<charParams l="923" t="1535" r="936" b="1555">f</charParams>
<charParams l="941" t="1541" r="956" b="1555">e</charParams>
<charParams l="962" t="1541" r="976" b="1555">r</charParams>
<charParams l="979" t="1535" r="996" b="1555">d</charParams>
<charParams l="997" t="1535" r="1046" b="1559"> </charParams>
<charParams l="1047" t="1535" r="1052" b="1559">[</charParams>
<charParams l="1061" t="1535" r="1075" b="1555" suspicious="1">2</charParams>
<charParams l="1082" t="1535" r="1096" b="1555">0</charParams>
<charParams l="1097" t="1535" r="1120" b="1555"> </charParams>
<charParams l="1121" t="1542" r="1136" b="1555">a</charParams>
<charParams l="1143" t="1536" r="1149" b="1559">]</charParams></formatting></line></par>

and

<text>
<par align="Justified">
<line baseline="1555" l="322" t="1535" r="334" b="1555"><formatting lang="GermanStandard">
<charParams l="322" t="1535" r="334" b="1555" suspicious="1">2</charParams></formatting></line></par>
</text>

 

API identifies the number one as a character "2" and adds the tag suspicious.
Can you please check it?

Best wishes

Marc

Was this article helpful?

0 out of 0 found this helpful

Comments

1 comment

  • Avatar
    Permanently deleted user

    Hi Marc,

    Kindly try to exclude the Latin language from the set of recognition languages.

    If the issue is still actual, please provide this PDF document that is being processed incorrectly to Cloudocrsdk@abbyy.com so we could test and inestigate the issue on our side. Thank you beforhand!

    0

Please sign in to leave a comment.