Permissioned Blockchain-based Framework for Ranking Synthetic Data Generators
Abstract
Synthetic data generation is increasingly recognized as a crucial solution to address data related challenges such as scarcity, bias, and privacy concerns. As synthetic data proliferates, the need for a robust evaluation framework to select a synthetic data generator becomes more pressing given the variety of options available. In this research study, we investigate two primary questions: 1) How can we select the most suitable synthetic data generator from a set of options for a specific purpose? 2) How can we make the selection process more transparent, accountable, and auditable? To address these questions, we introduce a novel approach in which the proposed ranking algorithm is implemented as a smart contract within a permissioned blockchain framework called Sawtooth. Through comprehensive experiments and comparisons with state-of-the-art baseline ranking solutions, our framework demonstrates its effectiveness in providing nuanced rankings that consider both desirable and undesirable properties. Furthermore, our framework serves as a valuable tool for selecting the optimal synthetic data generators for specific needs while ensuring compliance with data protection principles.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2024
- DOI:
- 10.48550/arXiv.2405.07196
- arXiv:
- arXiv:2405.07196
- Bibcode:
- 2024arXiv240507196R
- Keywords:
-
- Computer Science - Databases;
- Computer Science - Cryptography and Security;
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- Computer Science - Machine Learning