ABBYY FineReader Engine offers also a native XML format as an export of recognized document pages.
The XML export allows for different options. Below, we provide a sample for the character information:
- XCA_None
No character attributes are to be written in files in XML format. - XCA_Ascii
Character coordinates and character confidence are to be written in files in XML format. - XCA_Basic
Character coordinates are to be written in files in XML format. -
XCA_Extended
Character coordinates, character confidence and extended character attributes are to be written in files in XML format. The following extended attributes are written:-
- whether the word was found in the dictionary,
- whether the word was recognized with a standard or user-defined language,
- whether the word is a number,
- whether the word is an identifier,
- probability that a character is written with a Serif font,
- penalty for discordance of characters in a word,
- the mean width of stroke in the RLE representation of a word image.
-
ABBYY XML Tag Scheme
In FineReader Engine, the XML structure has the ability to save information of paragraph styles and roles in XML file.
Simple ABBYY XML Sample
To demonstrate the differences, the above image with the text 'Hallo World' was processed with ABBYY FineReader Server using the different XML export settings:
Processing the image with different options will demonstrate the principle structure of the native ABBYY XML Export. You can download a ZIP with the original tiff-file and the 5 different XML results here (in zip format).
XML Sample:
XML Character Attributes:
Extended XML Character Attributes:
Extended ABBYY XML Sample
This ZIP archive (1,3 MB) contains the processing results and the source image.
Original:
Zip content:
Comments
6 comments
Ben Meddeb Lotfi
Hello I can download the files Zip
Robert Baumgartner
Hello and thank you for your question. You can download the files by clicking the links in the text, but you can also use the links below.
TIFF-file and xmls: https://change.abbyy.com/media/30117/xml_output_halloworld.zip
Processing results and source image: https://change.abbyy.com/media/30118/abbyy_xml_sample_collection.zip
Ben Meddeb Lotfi
Hello, when i click in the url i will have this message in my browser
Robert Baumgartner
Thank you for the image - in this case, let's try it again with a changed url:
https://abbyy.com/media/30118/abbyy_xml_sample_collection.zip
https://abbyy.com/media/30117/xml_output_halloworld.zip
Ben Meddeb Lotfi
Thank you! Now I can download the files zip
Andrius Balsevičius
Hi.
I see this article was edited only 2 months ago, yet I can't find XML format between supported formats in ABBYY Fine Reader 16 Trial version.
Am I missing something?
Best regards
Andrius
Please sign in to leave a comment.