ABBYY FineReader Engine & XML Export

ABBYY FineReader Engine offers also a native XML format as an export of recognized document pages.
The XML export allows for different options. Below, we provide a sample for the character information:
  • XCA_None
    No character attributes are to be written in files in XML format.
  • XCA_Ascii
    Character coordinates and character confidence are to be written in files in XML format.
  • XCA_Basic
    Character coordinates are to be written in files in XML format.
  • XCA_Extended
    Character coordinates, character confidence and extended character attributes are to be written in files in XML format. The following extended attributes are written:
      • whether the word was found in the dictionary,
      • whether the word was recognized with a standard or user-defined language,
      • whether the word is a number,
      • whether the word is an identifier,
      • probability that a character is written with a Serif font,
      • penalty for discordance of characters in a word,
      • the mean width of stroke in the RLE representation of a word image.

xml_scheme.png

 

ABBYY XML Tag Scheme

In FineReader Engine, the XML structure has the ability to save information of paragraph styles and roles in XML file.

abbyy-xml-tag-scheme-illu.png

 

Simple ABBYY XML Sample

halloworld.png

To demonstrate the differences, the above image with the text 'Hallo World' was processed with ABBYY FineReader Server using the different XML export settings:

xml_export_settings_rs2.png

Processing the image with different options will demonstrate the principle structure of the native ABBYY XML Export. You can download a ZIP with the original tiff-file and the 5 different XML results here (in zip format).

 

XML Sample:

01_xml_halloworld_simple.png

XML Character Attributes:

02_xml_halloworld_character_attributes.png

Extended XML Character Attributes:

03_xml_halloworld_extended_cha_attributes.png

 

Extended ABBYY XML Sample

This ZIP archive (1,3 MB) contains the processing results and the source image.

Original:

demoimage1.jpg

Zip content:

abbyy_xml_sample_content.png

 

Have more questions? Submit a request

Comments

6 comments

Please sign in to leave a comment.