コミュニティ

[FR Engine 11 SDK] UTF-8 exported files have a BOM character at the beginning of the text file.

Hi,

We just noticed that when exporting to a UTF-8 text file, Fine Reader Engine adds a BOM (Byte Order Mark) character at the beginning of the file.

page.Export(tempTxtFile.getAbsolutePath(), FileExportFormatEnum.FEF_TextUnicodeDefaults, exportParams);

This BOM character (EF BB BF) indicates the Unicode representation of the text.

But when using UTF-8 it is optionnal and not recommended (ref. Unicode Standard 5.0) . Especially for Java which assumes that UTF8 files don't have a BOM. When reading the file, BOM character will be interpreted as ? in Java which is really annoying.

More infos here: http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html

Currently we have a workaround ( http://stackoverflow.com/questions/4897876/reading-utf-8-bom-marker) but it would be nice to condiser removing it in the future or make it optional ;)

この記事は役に立ちましたか?

0人中0人がこの記事が役に立ったと言っています

コメント

1件のコメント

  • Avatar
    Permanently deleted user

    Sorry for the delay with response.

    We have passed your suggestion to our analysts and created reclamation to make BOM character optional. Unfortunately, so far we do nоt have information when this feature will be available and we hope that will be implementing in the future versions.

    0

サインインしてコメントを残してください。