Imperva (Now through Mage)
HEADQUARTERS
FOUNDED
FUNDING
Redwood City, CA
2002
FOUNDERS
Shlomo Kramer
Amichai Shulman
Mickey Boodaei
EMPLOYEES
PRIVATE | PUBLIC
Private
COTS
COTS | OSS
USE CASES
Enterprise Data Management (EDM), Test Data Creation (TDC), Test Data Management (TDM)
GTM Insights
GTM Domain Insights
Data de-identification is an evolving domain driven by customer expectation, manifesting through Regulatory Organizations (e.g., GDPR, PIPL, etc.). There are many use cases that require de-identification, with an ever-increasing emphasis on re-identification risk.
Using GDPR as a leader in privacy regulation expectations, even if all direct identifiers are stripped out of a data set, the data will still be considered personal data if it is possible to link any data subjects to information in the data set relating to them (as per Recital 26 GDPR).  In other words, according to GDPR, a person does not have to be named to be identifiable. If there is other information enabling an individual to be simply connected to data about them, they may still be considered ‘identified’.
Of the vendors in this space, here are some notables:
Delphix - Delphix's differentiation is in their snapshot tree approach to virtual databases - creating whole, unique databases with minimal storage footprint. They have ephemeral capability to deploy their engines, have broad masking generators, and can integrate with CICD for automation. Their challenge is that they do NOT have synthetic data (on roadmap) and do NOT provide data subsetting capability (not on roadmap). It is best practice to subset data to the specific test being conducted, and use synthetic data for some FRs (Unit/System test) and NFRs (load, endurance, performance tests)
Use Delphix if you need a full copy of databases in the organization
Tonic.ai - Tonic's differentiation is in synthetic data generation (best in market) with a very broad set of generators, and in their ability to provide differential privacy. Their solution can evaluate new combinations of data in test sets to determine if row linking, or singling out may now be possible - and adding additional calibrated noise to the aggregated data set. Tonic's challenge is that they do NOT provide data virtualization and/or virtual database capability, instead focusing on synthetic data generation and privacy ONLY.
Use Tonic if you favor data security (synthetic & differential privacy), though be cautious with 'usability' of data
K2View - K2View's differentiation is with its data virtualization. Creating a composite view from many data sources is a powerful abstraction layer. K2View performs synthetics across a broad set of sources and by owning the abstraction layer they can also perform dynamic masking, which Tonic and Delphix can not perform. Challenge with K2View is that they have general applicability which means they are not as deep into data security like Tonic, nor are they focused on creating copies of underlying databases as Delphix.
Imperva has partnered with Mage for Data Masking
Source Types:
Databases (such as Oracle, Microsoft SQL Server, MySQL, and more), file systems, big data platforms, and cloud storage environments
Masking (classifiers & generators):
K-Anonymity and L-Diversity: Not specified
Custom Masking: Yes
Data Classification: Yes
Automation Support:
API Support: Yes
CICD Integrations: Yes
Deployment Types:
SaaS or On-Premises: Both
Container Deployment: Yes
Data Management:
Source Code Management: Yes
Subsetting: Yes
Referential Integrity: Yes
Automated Provisioning: Yes
Data Masking Reports: Yes, masked data, compliance status, and auditing information
Synthetic Test Data: Yes
Data Virtualization: Yes
https://www.imperva.com/resources/resource-library/datasheets/mage-static-data-masking/