How much data is needed to train a document classification or extraction model?

This will depend on the number of document types and variations of the document. A Document Skill training works from the very first sample document. A classification skill does require a few samples for at least two classes before you can train the skill. Vantage is optimized to quickly recognize major differences between images, so you can start designing your skill model with a small amount of data. ABBYY recommends starting with 3-5 images per document variation. Depending on the complexity, variation of documents, and degree of accuracy required, you may need several hundred samples. Sample images can be uploaded during design time or collected when running a skill in production which allows the training set to grow and the skill model to be rebuilt over time.

Community

Was this article helpful?

Comments

Didn't find what you were looking for?