Packages of pages - let's say 50 per package. They are all forms
I need to identify the form I'm interested in from the package of 50 (classify images). These are all template forms.
Once I've got the right page, I then extract a text snippet.
I can see how the second part happens, but not the first. Any pointers on doing a form or document classification stage to a processing pipeline?