Question
How does Multilevel Document Analysis (MDA) work?
Answer
In many cases, documents are more complex than just a page with black text printed on a white background. Modern complex layouts and formatting will often include different elements like tables, pictures, footers and headers, background images, and so on. In order to recognize such documents and preserve their complex formatting, all today’s OCR programs first analyze the structure of the document before they start reading it. During such a process, several logical levels will be singled out. At the top level of this hierarchy would be only one object – the page itself. The other levels are in descending order:- page
- table, text block
- table cell
- paragraph, picture
- line
- word, picture within a line
- letter (character)
- Any object in this hierarchy is composed of lower-level objects, e.g.
- letters make words
- words make lines, etc.
- The OCR technology analyses the document from the top down:
- it first divides the page into larger objects
- which in turn divides into smaller objects, etc., until it reaches the level of characters.
- Once the characters have been singled out and recognized, a reverse process begins: The program assembles individual objects into larger objects that will be combined into the complete page at the end. This procedure is called Multilevel Document Analysis, or MDA.
- If an OCR program makes a recognition error at one of the higher levels of analysis (e.g., if it mistakes a paragraph for a picture), there is very little chance that it will come up with the right result – the document will be recognized incorrectly.
Comments
0 comments
Please sign in to leave a comment.