Community

[FR Engine 11 SDK] UTF-8 exported files have a BOM character at the beginning of the text file.

Hi,

We just noticed that when exporting to a UTF-8 text file, Fine Reader Engine adds a BOM (Byte Order Mark) character at the beginning of the file.

page.Export(tempTxtFile.getAbsolutePath(), FileExportFormatEnum.FEF_TextUnicodeDefaults, exportParams);

This BOM character (EF BB BF) indicates the Unicode representation of the text.

But when using UTF-8 it is optionnal and not recommended (ref. Unicode Standard 5.0) . Especially for Java which assumes that UTF8 files don't have a BOM. When reading the file, BOM character will be interpreted as ? in Java which is really annoying.

More infos here: http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html

Currently we have a workaround ( http://stackoverflow.com/questions/4897876/reading-utf-8-bom-marker) but it would be nice to condiser removing it in the future or make it optional ;)

Was this article helpful?

0 out of 0 found this helpful

Comments

1 comment

  • Avatar
    Permanently deleted user

    Sorry for the delay with response.

    We have passed your suggestion to our analysts and created reclamation to make BOM character optional. Unfortunately, so far we do nоt have information when this feature will be available and we hope that will be implementing in the future versions.

    0

Please sign in to leave a comment.