I was thinking of using the service for generating records from received invoices, which have the same information in multiple different possible formats, including freetext documents written by consultants. As they share the fields required by the law, I was thinking of using regExp queries, but failed noticing that field extraction works well only when field position is accurately specified.
Are you going to work in the direction of identifying fields by the expected characteristics such as nearby titles or expected type and dimension, or should I think of mapping the entire page to a database o words and their positions and manage the search myself?
Thanks