Dataveil
HEADQUARTERS
FOUNDED
FUNDING
Melbourne, Australia
2007
FOUNDERS
Stuart Green
Robin Green
EMPLOYEES
PRIVATE | PUBLIC
Private
COTS
COTS | OSS
USE CASES
Test Data Management (TDM), Test Data Creation (TDC)
GTM Insights
GTM Domain Insights
Data de-identification is an evolving domain driven by customer expectation, manifesting through Regulatory Organizations (e.g., GDPR, PIPL, etc.). There are many use cases that require de-identification, with an ever-increasing emphasis on re-identification risk.
Using GDPR as a leader in privacy regulation expectations, even if all direct identifiers are stripped out of a data set, the data will still be considered personal data if it is possible to link any data subjects to information in the data set relating to them (as per Recital 26 GDPR).  In other words, according to GDPR, a person does not have to be named to be identifiable. If there is other information enabling an individual to be simply connected to data about them, they may still be considered ‘identified’.
Of the vendors in this space, here are some notables:
Delphix - Delphix's differentiation is in their snapshot tree approach to virtual databases - creating whole, unique databases with minimal storage footprint. They have ephemeral capability to deploy their engines, have broad masking generators, and can integrate with CICD for automation. Their challenge is that they do NOT have synthetic data (on roadmap) and do NOT provide data subsetting capability (not on roadmap). It is best practice to subset data to the specific test being conducted, and use synthetic data for some FRs (Unit/System test) and NFRs (load, endurance, performance tests)
Use Delphix if you need a full copy of databases in the organization
Tonic.ai - Tonic's differentiation is in synthetic data generation (best in market) with a very broad set of generators, and in their ability to provide differential privacy. Their solution can evaluate new combinations of data in test sets to determine if row linking, or singling out may now be possible - and adding additional calibrated noise to the aggregated data set. Tonic's challenge is that they do NOT provide data virtualization and/or virtual database capability, instead focusing on synthetic data generation and privacy ONLY.
Use Tonic if you favor data security (synthetic & differential privacy), though be cautious with 'usability' of data
K2View - K2View's differentiation is with its data virtualization. Creating a composite view from many data sources is a powerful abstraction layer. K2View performs synthetics across a broad set of sources and by owning the abstraction layer they can also perform dynamic masking, which Tonic and Delphix can not perform. Challenge with K2View is that they have general applicability which means they are not as deep into data security like Tonic, nor are they focused on creating copies of underlying databases as Delphix.
Source Types:
Databases (such as Oracle, Microsoft SQL Server, MySQL, and Azure SQL), files (such as JSON and CSV)
Masking (classifiers & generators):
K-Anonymity and L-Diversity: Yes, through relationship obfuscation
Custom Masking: Yes, 28 masks out-of-the-box
Data Classification: Yes
Automation Support:
API Support: Yes
CICD Integrations: Yes
Deployment Types:
SaaS or On-Premises: Not Specified
Container Deployment: Not specified
Data Management:
Source Code Management: Yes, saved as plain XML text
Subsetting: Not specified
Referential Integrity: Yes, preserves statistical and syntactical properties
Automated Provisioning: Yes
Data Masking Reports: Yes, preview before & after masked values, side-by-side, without overwriting original data
Synthetic Test Data: Yes
Data Virtualization: Yes
Pricing: https://www.dataveil.com/data-masking-software-prices/
https://www.dataveil.com/user-guide/