I want to convert pdf and images to text file and extract data. All of my documents will contain both English and Thai language. I have tried various options to extract text.
Option 1 : --lang=English,Thai --profile=textExtraction
Option 2 : --lang=Thai --profile=textExtraction
Option 3 : --lang=English,Thai --profile=documentConversion
Option 4 : --lang=Thai --profile=documentConversion
There was a lot of mismatches between the input data and the output text. Option 4 gives the most accurate conversion. But the English text will be lost in this case. Is there any way were I can upload a single file and receive two output files. One for English and one for Thai. Otherwise I will have to upload the file twice.
コメント
1件のコメント
If you need to extract at first English text from your image and then Thai, you can call the task twice with different settings. In this case, re-recognition will be performed for free.
Anyway, it seems that the issue may be connected with the source image quality. Please check if your source image has appropriate quality for OCR and review our Best Practices section where you can find our tips how to scan or photograph the documents to achieve the best recognition results.
If the structure of your documents is not very important, it is better to use the textExtraction profile and export the result to the TXT or XML export formats (if you need to perform further processing on your side).
And to get our additional recommendations, please send the images for which the issue can be reproduced to CloudOCRSDK@abbyy.com.
サインインしてコメントを残してください。