
OpenRefine
HEADQUARTERS
FOUNDED
FUNDING
NA
2010
FOUNDERS
EMPLOYEES
200
PRIVATE | PUBLIC
NA
OSS
COTS | OSS
USE CASES
ETL Masking
GTM Insights
Project name: OpenRefine
Year project was started: 2010
Number of contributors: Over 200 contributors
Founders: Metaweb Technologies, Google
URL to GitHub repository: https://github.com/OpenRefine/OpenRefine
Brief description of the project: OpenRefine, formerly known as Google Refine, is an open source data cleaning and transformation tool. It provides a user-friendly interface for exploring, cleaning, and transforming messy data, making it more usable for analysis and processing.
Brief description of the data masking capabilities: OpenRefine is primarily focused on data cleaning and transformation rather than data masking specifically. While it does not provide built-in data masking capabilities, it can be used as part of a broader data pipeline to clean and transform data in a way that effectively masks sensitive information.
List of compatible data source types: OpenRefine can work with various data source types, including CSV files, Excel spreadsheets, JSON files, XML files, databases (via JDBC), and web-based APIs.
Support for re-identification methods like k-anonymity and l-diversity: OpenRefine does not directly support re-identification methods like k-anonymity and l-diversity. It focuses on data cleaning and transformation rather than privacy-preserving techniques.
Support for custom masking: OpenRefine provides a wide range of transformation functions and operations that can be used to implement custom masking techniques. Users can define their own transformation rules to mask sensitive data.
Ability to discover and classify sensitive data: OpenRefine does not have built-in capabilities for automatic discovery and classification of sensitive data. Users need to manually identify and define sensitive data based on their domain knowledge.
API availability: OpenRefine provides a JSON-based API that allows users to interact with and automate tasks in OpenRefine programmatically.
Integration with CI/CD solutions: OpenRefine can be integrated into CI/CD workflows by including OpenRefine scripts and commands as part of the data cleaning and transformation process.
Deployment in containers like Docker: OpenRefine can be deployed in containers like Docker, facilitating easy deployment and management.
Maintenance of data masking configurations in source code management: OpenRefine does not explicitly support maintaining data masking configurations in source code management. However, users can version control OpenRefine projects and configuration files using standard source code management practices.
Support for subsetting: OpenRefine is primarily focused on data cleaning and transformation rather than subsetting capabilities.
Support for referential integrity: OpenRefine does not have built-in support for enforcing referential integrity constraints as it primarily focuses on data cleaning and transformation.
Support for automated provisioning of test environments and data: OpenRefine does not provide specific features for automated provisioning of test environments and data. However, it can be integrated into automated workflows through scripting and automation.
Production of data masking reports: OpenRefine does not produce specific data masking reports. However, it provides various logging and auditing features to track the data cleaning and transformation process.
Support for synthetic creation of test data: OpenRefine does not explicitly support synthetic creation of test data. Its main focus is on data cleaning and transformation.
Support for data virtualization: OpenRefine does not explicitly support data virtualization.