top of page
Untitled design (1).png
GTM Research Logo (6).png

Dataveil

Enterprise Applicablity
HEADQUARTERS
FOUNDED
FUNDING

Melbourne, Australia

2007

FOUNDERS

Stuart Green

Robin Green

EMPLOYEES

PRIVATE | PUBLIC

Private

COTS

COTS | OSS
USE CASES

Test Data Management (TDM), Test Data Creation (TDC)

GTM Insights

GTM Domain Insights


Data de-identification is an evolving domain driven by customer expectation, manifesting through Regulatory Organizations (e.g., GDPR, PIPL, etc.). There are many use cases that require de-identification, with an ever-increasing emphasis on re-identification risk.


Using GDPR as a leader in privacy regulation expectations, even if all direct identifiers are stripped out of a data set, the data will still be considered personal data if it is possible to link any data subjects to information in the data set relating to them (as per Recital 26 GDPR).  In other words, according to GDPR, a person does not have to be named to be identifiable. If there is other information enabling an individual to be simply connected to data about them, they may still be considered ‘identified’.


Of the vendors in this space, here are some notables:


  • Delphix - Delphix's differentiation is in their snapshot tree approach to virtual databases - creating whole, unique databases with minimal storage footprint. They have ephemeral capability to deploy their engines, have broad masking generators, and can integrate with CICD for automation. Their challenge is that they do NOT have synthetic data (on roadmap) and do NOT provide data subsetting capability (not on roadmap). It is best practice to subset data to the specific test being conducted, and use synthetic data for some FRs (Unit/System test) and NFRs (load, endurance, performance tests)


Use Delphix if you need a full copy of databases in the organization

  • Tonic.ai - Tonic's differentiation is in synthetic data generation (best in market) with a very broad set of generators, and in their ability to provide differential privacy. Their solution can evaluate new combinations of data in test sets to determine if row linking, or singling out may now be possible - and adding additional calibrated noise to the aggregated data set. Tonic's challenge is that they do NOT provide data virtualization and/or virtual database capability, instead focusing on synthetic data generation and privacy ONLY.

Use Tonic if you favor data security (synthetic & differential privacy), though be cautious with 'usability' of data

  • K2View - K2View's differentiation is with its data virtualization. Creating a composite view from many data sources is a powerful abstraction layer. K2View performs synthetics across a broad set of sources and by owning the abstraction layer they can also perform dynamic masking, which Tonic and Delphix can not perform. Challenge with K2View is that they have general applicability which means they are not as deep into data security like Tonic, nor are they focused on creating copies of underlying databases as Delphix.




Source Types:

  • Databases (such as Oracle, Microsoft SQL Server, MySQL, and Azure SQL), files (such as JSON and CSV)


Masking (classifiers & generators):

  • K-Anonymity and L-Diversity: Yes, through relationship obfuscation

  • Custom Masking: Yes, 28 masks out-of-the-box

  • Data Classification: Yes


Automation Support:

  • API Support: Yes

  • CICD Integrations: Yes


Deployment Types:

  • SaaS or On-Premises: Not Specified

  • Container Deployment: Not specified


Data Management:

  • Source Code Management: Yes, saved as plain XML text

  • Subsetting: Not specified

  • Referential Integrity: Yes, preserves statistical and syntactical properties

  • Automated Provisioning: Yes

  • Data Masking Reports: Yes, preview before & after masked values, side-by-side, without overwriting original data

  • Synthetic Test Data: Yes

  • Data Virtualization: Yes


Pricing: https://www.dataveil.com/data-masking-software-prices/


https://www.dataveil.com/user-guide/


bottom of page