コミュニティ

Extract English and Thai text

Written by Permanently deleted user

2017年10月10日 12:01
1

I want to convert pdf and images to text file and extract data. All of my documents will contain both English and Thai language. I have tried various options to extract text.

Option 1 : --lang=English,Thai --profile=textExtraction

Option 2 : --lang=Thai --profile=textExtraction

Option 3 : --lang=English,Thai --profile=documentConversion

Option 4 : --lang=Thai --profile=documentConversion

There was a lot of mismatches between the input data and the output text. Option 4 gives the most accurate conversion. But the English text will be lost in this case. Is there any way were I can upload a single file and receive two output files. One for English and one for Thai. Otherwise I will have to upload the file twice.

この記事は役に立ちましたか？

0人中0人がこの記事が役に立ったと言っています

1件のコメント

Permanently deleted user

2017年10月11日 11:46
If you need to extract at first English text from your image and then Thai, you can call the task twice with different settings. In this case, re-recognition will be performed for free.

Anyway, it seems that the issue may be connected with the source image quality. Please check if your source image has appropriate quality for OCR and review our Best Practices section where you can find our tips how to scan or photograph the documents to achieve the best recognition results.

If the structure of your documents is not very important, it is better to use the textExtraction profile and export the result to the TXT or XML export formats (if you need to perform further processing on your side).

And to get our additional recommendations, please send the images for which the issue can be reproduced to CloudOCRSDK@abbyy.com.

0

サインインしてコメントを残してください。

コミュニティ

Extract English and Thai text

この記事は役に立ちましたか？

コメント

お探しのものを見つけられませんでしたか？