Data Manager (DM) is a utility for you to prepare your data for the upload into the ABBYY Timeline application. The idea behind the DM is that you could combine, transform, cleanse, and sanitize (de-sensitize) data on your local machine/network before uploading the data to the cloud.
The utility is a local executable file so you don’t need to be concerned about the security of the raw data, data transfer, and internet, etc. The data and the executable all remain on the local machine behind your own corporate firewall.
The utility is able to perform the following operations on data:
- Access multiple data sources including CSV and XLSX files and relational databases.
- Merge (de-normalize) data from the same or multiple sources
- Remove records with a specific condition such as missing or out of range value
- Create additional compound fields by concatenating several other fields
- Perform the basic transformations within a field:
- Trim spaces
- Convert case
- Remove or replace specific substrings
- Data sanitation:
- Perform one-way hash on a sensitive field
- Perform a name replacement for a fake name
- Replace a string with X first characters
- Encode the string with a password *
- Save the result into CSV file for review and upload
So there are the following key types of operations:
- Extract (load) dataset. It includes connect to DBMS and perform a query or load a file.
- Join several datasets into one via the key fields
- Filter a dataset by a field value
- Create a compound field
- Transform the field value
- Generate CSV file
You could perform any sequence of these operations, for example:
- Connect to DBMS1 and execute a query “SELECT…” to get dataset A.
- Load file File.csv to get dataset B
- Trim spaces from field A.f1.
- Create new field A.f3 by concatenating A.f1 and “space” and A.f2.
- Join dataset A to B via the key fields A.f3 and B.f1.
- Hash field A.f4
- Replace field B.f2 with three characters, like Elk***.
- Produce CSV file with fields A.f1, A.f2, …, B.f1, B.f2…