I have images containing this kind of content:
long_text_description_1 number1a number1b number1c
long_text_description_2 number2a number2b number2c
long_text_description_3 number3c
long_text_description_4 number4a number4b number4c
...
and I would like to get the recognised output in the same order. Instead the API gives me a TXT file with column-based results, something like:
long_text_description_1
long_text_description_2
long_text_description_3
long_text_description_4
number1a
number2a
number4a
number1b
number2b
number4b
number1c
number2c
number3c
number4c
which make me loose all the correspondence/match between texts and numbers. I can't know any more which number belong to which text description
I just want to read line by line, what is the easiest way to achieve this?
My problem occurs when using TXT as output. When I asks for XLSX, if i open the resulting file, row and columns information are there but xlsx is not a format that a programmatic algorithm can use easily. CSV would be good but does not seem to be available.
So, what is the best way to get the plain text from my image, in a line by line fashion?
Thanks in advance.
Comments
4 comments
also using processtextfield instead doesn't work since it seems to be limited to 200 characters per image. Unless that limit can be changed somewhere?
You could perform export to XML and process words with its coordinates on your side. The region parameter of the processTextField method could be set to the coordinates corresponded to the each line.
Hi any developments on this? XLSX provides excellent results but csv or json is essential at this point.
We recommend either to use export to XML or to wait about a month for the beta-testing of receipts recognition feature.
More detailes are in a previous thread.
Please sign in to leave a comment.