What are the requirements for a project with NLP?
The following requirement should be met in order to make a project work:
- Document definition with properly configured NLP models;
- NLP training package on which these models were trained.
The bulk of the disk is taken up by the recognition results of multi-page documents. If the models are trained and there are no plans to retrain them, remove the documents from the training package without removing the package itself or starting training again.
This will reduce the volume of a project. The cost of this reduction will be as follows:
- There is no way to analyze the markup on which the training was done. If a copy of the project is stored somewhere, this is not critical in principle;
- There is no way to retrain the model because the training will be done without the original set of documents and the quality of the model is likely to deteriorate. If the model does not need to be updated, then in principle this is not critical either.