No character attributes are to be written in files in XML format.
Character coordinates and character confidence are to be written in files in XML format.
Character coordinates are to be written in files in XML format.
Character coordinates, character confidence and extended character attributes are to be written in files in XML format. The following extended attributes are written:
- whether the word was found in the dictionary,
- whether the word was recognized with a standard or user-defined language,
- whether the word is a number,
- whether the word is an identifier,
- probability that a character is written with a Serif font,
- penalty for discordance of characters in a word,
- the mean width of stroke in the RLE representation of a word image.
ABBYY XML Tag Scheme
In FineReader Engine, the XML structure has the ability to save information of paragraph styles and roles in XML file.
Simple ABBYY XML Sample
To demonstrate the differences, the above image with the text 'Hallo World' was processed with ABBYY FineReader Server using the different XML export settings:
Processing the image with different options will demonstrate the principle structure of the native ABBYY XML Export. You can download a ZIP with the original tiff-file and the 5 different XML results here (in zip format).
XML Character Attributes:
Extended XML Character Attributes:
Extended ABBYY XML Sample
This ZIP archive (1,3 MB) contains the processing results and the source image.